Enmotus MiDrive: Rethinking SLC Caching For QLC SSDsby Billy Tallis on January 30, 2020 8:00 AM EST
For consumer storage, CES 2020 brought a new wave of competition for PCIe 4.0 SSDs and promise of faster portable SSDs, but the most intriguing product demo was from Enmotus. They are planning a profound change to how consumer SSDs work, ditching drive-managed SLC caching in favor of host-managed tiered storage.
Enmotus is a well-established provider of storage management software. Their most familiar product to consumers is probably FuzeDrive, a limited edition of which is bundled with recent generations of AMD motherboards as AMD StoreMI. This serves as AMD's answer to Intel's Smart Response Technology (SRT) and Optane Memory storage caching systems. Enmotus also has enterprise-oriented products in the same vein. Their new MiDrive technology builds on their existing tiering software to manage a combination of SLC and QLC NAND on a single consumer SSD.
Caching and Tiering Challenges
All software-driven caching or tiering solutions tend to have limited consumer appeal due to the complexity of setting up the system. At least two physical drives are required, and the OS needs to load an extra driver to manage data placement. Any compatibility issue or other glitch can easily render a PC unbootable, and data recovery isn't as straightforward as for a single drive. These hurdles don't scare off enthusiasts and power users, but PC OEMs aren't eager to market and support these configurations. But without some form of caching or tiering, consumer SSDs would be limited to the raw performance of TLC or QLC NAND. SLC caching managed transparently by the SSD's firmware has been adopted by almost all consumer SSDs in order to improve burst performance, and it has proven to be very effective for consumer workloads. The fundamental limitation of this strategy is that the SSD must work with limited information about the nature and purpose of the user data it is reading and writing.
Most SSDs rely on fairly simple procedures for managing their SLC caches: sending all writes to the cache unless it's full, and using idle time to fold data from SLC into more compact TLC representation, freeing up cache space for future bursts of writes. There are still some choices to be made in implementing SLC caching for consumer SSDs: whether to use a fixed-size cache or dynamically sized, and whether to stall when the cache fills up or divert writes straight to TLC/QLC. As QLC drives become more common, we're also seeing drives that prefer to keep data in the SLC cache long-term until the drive starts to fill up, so that the cache can help with read performance in addition to write performance.
Enmotus FuzeDrive manual data placement controls
Host-managed caching or tiering opens the door to more intelligent management of data placement, since the host OS has better information: about which chunks of data belong to what file, and about the processes and users that interact with those files. It is easier for the host OS to accurately track the history of access patterns for hot vs. cold files. It is also possible to expose manual control of data placement directly to the user.
Two Drives In One
The Enmotus MiDrive technology allows one SSD to present the host with access to two separate pools of flash storage: QLC and SLC managed by the same SSD controller. To implement this, they have partnered with Phison to modify SSD controller firmware. For server products, a single NVMe SSD would expose two separate NVMe namespaces that Linux treats as different block devices. But for consumers, Enmotus has chosen to maximize backwards compatibility by having the MiDrive present itself as a single block device, with the first 32 or 64 GB initially mapped to SLC NAND and the rest of the drive mapped to QLC NAND. This makes it possible (and fast!) to install an OS to a MiDrive without needing any special Enmotus software or drivers. Once the Enmotus driver has been loaded, it takes over the management of data placement using vendor-specific commands that instruct the SSD to promote or demote ranges of Logical Block Addresses (LBAs) between the QLC and SLC pools of flash. (The initialization process for this tiering currently takes about a quarter of a second, because very little data needs to be moved until there's history indicating what should be in QLC vs SLC.)
Enmotus MiDrive 800GB appearing as a single device
This is a lot simpler for the host side than the strategy Intel uses for their Optane Memory H10, which is two separate PCIe devices on one M.2 card and requires special motherboard support to properly detect both halves before the caching software can even get involved. Enmotus is working to make MiDrive even simpler by having Microsoft distribute the Enmotus driver with Windows, so that a MiDrive will be automatically detected and managed by the Enmotus software without requiring any user intervention. For now, Windows will default to using its standard NVMe driver for a MiDrive, but that should change by the time products hit the shelves.
Example of how MiDrive LBA allocation will change with use
(for illustration purposes only, not based on real testing)
Enmotus supports assigning data to SLC or QLC in 4MB chunks, which is probably the size of a single NAND flash erase block in SLC mode, and thus the smallest chunk size that can easily be remapped between the QLC and SLC portions of the drive without contributing to unnecessary write amplification. That 4MB block size means that a small file moved to SLC is likely to bring along other nearby files, which will often contain related data that may also benefit from being in SLC. It also means that large files can be partially resident in SLC and partially in QLC. Since this process doesn't change the logical block addresses a file occupies, Enmotus MiDrive doesn't need to change anything about how NTFS organizes data, and it doesn't need to behave like an advanced disk defragmenter that tries to move important data toward the beginning of the disk. The MiDrive software only needs to look up what LBAs are used by a file and tell the SSD whether to move that data to SLC or QLC blocks. The only side-effect visible to the rest of the OS is a change in the performance characteristics for accessing that part of the SSD.
The SLC portion of an Enmotus MiDrive differs from a traditional SLC cache not only by being host-managed, but also in how the SSD treats it for wear leveling purposes. A typical SSD's SLC cache may have a static or dynamic size, but in either case when new write commands arrive the SSD will write the data to whatever NAND flash block is currently empty. When the cache is flushed, data from several SLC blocks will be rewritten in TLC or QLC mode to a different empty block, and the SLC blocks are then free to be erased and put back into the pool of available blocks. Managing just one pool of empty blocks means that the actual physical location of the SLC cache can move around over time, and a block that was last used as TLC might end up being used as SLC the next time data is written to it.
By contrast, Enmotus MiDrive technology has the SSD track two entirely separate pools. When the drive is manufactured, the SLC portion is permanently allocated for the lifetime of the drive. Any physical NAND pages and blocks that are used as SLC will always be treated as SLC for the lifetime of the drive, and the same for the QLC portion. The two pools of flash are subject to completely independent wear leveling, even though SLC and QLC portions will exist side by side on each physical flash chip on the drive. This means that the QLC blocks will never be subjected to the short-term Program/Erase cycles of SLC cache filling and flushing. For the SLC blocks, the error correction can be tuned specifically to SLC usage, and that allows Enmotus to achieve around 30k Program/Erase cycles for the SLC portion of the drive (based on Micron QLC NAND). MiDrives will expose separate SMART indicators for the SLC and QLC portions of the drive, so monitoring software will need to be updated to properly interpret this information.
In principle, it would be possible for either the SLC or QLC portion of the drive to be worn out prematurely, but in practice Enmotus is confident that their tiered storage management software will lead to longer overall drive lifespans than drive-managed SLC caching. Files that are known to be frequently modified will permanently reside on SLC and not be automatically flushed out to QLC during idle time. If the Enmotus software is smart enough, it will also be able to determine which files should skip the SLC and go straight to QLC until it becomes clear that a file is frequently accessed. For example, a file download coming into the machine over gigabit Ethernet will not initially need SLC performance because raw QLC can generally handle sequential writes at that speed (especially with no background SLC cache flushing to slow things down). And if that file is a movie which is infrequently accessed and only read sequentially, there's no reason for it to ever be promoted up to SLC. In general, the tiered storage management done by Enmotus should result in less data movement between SLC and QLC, rather than the increased write amplification that traditional SLC caching causes.
Since the SLC portion of an Enmotus MiDrive is a slice carved out of regular QLC NAND, it cannot offer all the benefits of specialized low-latency SLC NAND like Samsung's Z-NAND or Kioxia/Toshiba XL-Flash. The SLC portion of a MiDrive won't be appreciably faster than the SLC cache of a traditional consumer SSD, but that performance will be more consistent and predictable when working with files that are kept entirely on the SLC portion of the drive.
The Business Model
Enmotus MiDrive is currently implemented as a combination of Windows driver software and custom SSD firmware for Phison NVMe controllers, but it does not require any custom hardware. This means that any vendor currently selling Phison E12 NVMe SSDs can make a MiDrive-based product by licensing and shipping Enmotus firmware. PC OEMs can adopt MiDrives by switching to drives with Enmotus firmware and ensuring that they either include the Enmotus drivers in their Windows images, or relying on them to be distributed through Windows Update. No motherboard firmware or hardware modifications are required, or any changes to the process of provisioning a machine and preparing it for delivery to the end user. Enmotus is engaging both with PC OEMs and vendors of retail SSDs, so we can expect pre-built systems with Enmotus MiDrive technology and and upgrade options usable on any Windows 10 PC that already supports standard M.2 NVMe SSDs. Enmotus is optimistic about uptake from PC OEMs, expecting MiDrive to get a much better reception than Intel's Optane H10 did.
The basic MiDrive products will be fully automatic, with the Enmotus driver pre-installed or installed automatically when a MiDrive is detected. Data placement decisions will be completely behind-the-scenes. For enthusiasts, there will also be a premium tier similar to their current FuzeDrive software, which includes Windows Explorer shell integration so that individual files can be manually promoted or demoted, either permanently or for a limited period of time. Enmotus will also be providing a drive health monitoring tool that will include their estimate for how much extra drive lifetime has been won by using their tiering instead of ordinary SLC caching.
Mockup of Enmotus MiDrive SSD health monitoring tool
Enmotus expects SSDs with MiDrive technology to mostly use either 32GB or 64GB SLC portions and offer total capacities from about 400GB up to around 2TB, but the exact configurations will be determined by what their partners want to bring to market. Enmotus is also planning enthusiast-oriented solutions supporting RAID-0 style striping across multiple physical drives, and solutions for single-package BGA SSDs that go into small form factor and embedded devices.
Enmotus MiDrive technology will add to the price of SSDs, but since we're talking about QLC storage that's only relative to the cheapest NVMe SSDs available, and the final sticker prices will still be competitive for consumer SSDs. In return for that, users should get better real-world performance and enough effective write endurance to justify a 5-year warranty. We're looking forward to testing out this technology later this year, even though it will further complicate our benchmarking process. Enmotus is already sampling to interested OEMs.
Post Your CommentPlease log in or sign up to comment.
View All Comments
Samus - Sunday, February 2, 2020 - linkHard disks become incredibly slow as they fill up too, easily sub-100MB/sec, and as they become fragmented, access times can be awful.
deil - Thursday, January 30, 2020 - linkthat's partially true. Tech goes on and sticking to OLD product keeps you from getting 3D nands that yelded nice upgrades for example. there is literally ONE drive (960 pro) and its now successor CES 2020: Samsung 980 PRO PCIe 4.0 that have both MLC and nice nvme format.
99% of people never reach the point where TLC is the problem.
for my father/niece that both care for OS to boot and browser to load funny animals QLC is just cheap good enough.
that laptop wont survive longer than the drive, I can assure you about it.
world exists of ~70% of that kind of clients. 25% of power users that can strain TLC drive and that 5% that need SLC.
ET - Thursday, January 30, 2020 - linkI personally care more about how things work in practice rather than the exact technology. If QLC with a smart SLC cache ends up providing enough performance and endurance for my needs, all at a low price, I'd certainly welcome it.
PeachNCream - Thursday, January 30, 2020 - linkQLC sucks, TLC is not-so-great. There is no getting around the fact that adding additional measurable states to very tiny NAND cells results in ever declining durability and longevity. The trouble is that you're swimming against the current from both the consumer and the OEM perspectives. OEMs want to maximize profits so they sell you low endurance, slow TLC and unfortunately now QLC at only marginally lower costs, pocketing the profits. Consumers eagerly snap up the little price decreases they see in order to obtain higher capacities and "logic" Q/TLC's poor useful lifespan away by saying the endurance is "good enough" or "you'll never wear out a drive because my OEM-provided software insists I've only inflicted 3% wear over the last year" and then they talk with their wallets, buying up crappier NAND that further encourages OEMs to keep screwing consumers over. Thsi silly cycle has caused NAND to circle the proverbial drain in the toilet of life for years now and without any viable alternatives on the horizon, we are entering a point where people like you and me are going to have to sit here enjoying our QLC NAND with an optimistic 500 P/E cycles per cell while telling ourselves that the OEM drive software is honestly representing drive lifespan and that everything will be okay right up to the point when our storage stops working. Yay!
valinor89 - Thursday, January 30, 2020 - linkI am sure then that you are willing to pay what we used to pay for SLC drives back when they were the only option, right? Will you be happy to pay the same for a 256 GB SLC drive than we pay for a 1 TB QLC?
I for once salute the companies for providing a cheap enought (TLC and QLC) alternative to spinning rust for 99% of PC users and also for providing expensive SSDs with MLC for the "PRO's".
I am sure that if you are willing to pay enought you can also get an ultra expensive SLC Enterprise drive that will satisfy your personal needs.
TLDR: You pay for what you get.
trparky - Thursday, January 30, 2020 - link> TLC is not-so-great
My Samsung 970 EVO would beg to differ. I've had it for a year and a half and its performance is nothing short of amazing. TLC may have been bad when it first came out but now that it's been in the market for some time the manufacturers know the limitations of it and can work around them to make it so that penalities aren't nearly as bad or as noticable to the end user.
PeachNCream - Thursday, January 30, 2020 - linkMainly my problem with TLC and QLC is endurance. Read performance, where most client workloads reside leaves end users with the impression of high system responsiveness. Write performance is another story, but as you and the article above have already mentioned, pseudo-SLC cache modes mask most of the hit.
azazel1024 - Thursday, January 30, 2020 - linkI won't go QLC drive until they do somehow improve QLC performance, and this caching just likely is not "good enough" for me. Maybe it is, maybe it isn't. My uses though are for a system/application disk where honestly I want everything to be very snappy. The other is for bulk storage. Well, SSD prices aren't enough for the bulk storage I am using today. They are getting close, but not there. My use cases with bulk storage also means caching is likely to fall on its face at some point. They type of caching presented here is great for a "one disk to rule them all", that will of course have some amount of compromises. But isn't great as a system/application disk or a bulk disk.
Bulk sounds like nothing is going to hit the SLC cache. At least most likely not. Pure system/application disk, you are likely to have a lot of misses on the SLC cache then if it is size limited and things are often not bumped to QLC to make room for writes.
My bulk use case is for storing movies, music, photos and application installation files on the order of about 3.4TiB of data or a bit more. But every once in awhile I have to move a full disk image. Where QLC drive of ANY type is going to run out of SLC cache and result in slow writes to QLC. That or it hanging up while it evicts things to QLC and empties the SLC. Right now that is through a pair of 3TB HDDs in RAID0. A set in my server, a set in my desktop (and a 6TB USB HDD for offline backup of it all). I've got 2x1GbE between my desktop and my server, which means I am network limited to about 235MB/sec transfer speeds when I need to do a full disk copy.
If QLC drives were in there, some of the smaller files might transfer a little faster (the RAID0 array and network link slows to around 80MiB/sec once it is copying photo/music directories with the smaller file sizes), but larger files? That cache gets filled up, from everything I've seen its going to slow down to about 70MB/sec.
At least with TLC you are talking more on the order of 200MB/sec once the cache fills.
A RAID0 of a pair of 2TB QLC drives could fit all my files, but still, you are talking ~140MB/sec writes once the cache fills. RAID0 with a pair of 2TB TLC drives and you are pushing 400MB/sec, which is well over what my network connection can manage. It would barely be disk limited if I had a pair of 2.5GbE links or a 5GbE link, which I am hoping I'll have (at least a single 2.5GbE link) in the next couple of years if switch prices would come down a bit more with some more players on the market.
I don't need 10,000,000MB/sec transfers. Or RAM disk speeds or anything else like that. But I'd really, really like to have at least the performance to saturate a single 2.5GbE link with whatever I implement. Bonus points if it COULD be a single TLC drive of >4TB capacity. I am willing to do a pair of SSD's in RAID0 to get the performance I want. Which TLC drives can do in spades even once their caches are filled. But QLC can't. Not even close.
But I absolutely see the use case for 80+% of users. Not sure what the exact management strategy will be, but TBH it seems like the smartest way to do it would be 32-64GB of SLC acting as combo page file and most frequently accessed file cache. But it would still make sense to have at least 8-16GB of SLC cache as a pure write buffer for the QLC. That would likely satisfy 90% of users (or more!) who would never, ever notice performance degradation of pure QLC writes which are slow.
That being said, at that point, unless you need a huge drive, why not TLC? It looks like you'd be taking a QLC drive and making it 32+400GB capacity or similar. Where as for a TLC drive, you'd have a 500/512GB drive with dynamic SLC cache. With more frequent "good" performance for all files, versus just the commonly used ones. Sure the cost might be somewhat higher, but you get 18% more storage at that tier for the TLC drive and likely better performance for edge cases and for average use cases you'd probably have pretty similar performance.
That to me says that a QLC drive with this technology probably needs to be at least 18% cheaper to be maybe worth while.
linuxgeex - Thursday, January 30, 2020 - linkYou're entirely right, and you also need to bear in mind the labour costs of swapping out the QLC drives more frequently in an organization where the DWPD will result in premature failures for some users compared to TLC drives.
Look at Samsung QVO vs EVO pricing. Their QLC are about 25% cheaper. AData on the other hand isn't differentiating as much between SU800 and SU630 pricing. So I wouldn't recommend opting for AData QLC drives at this point in time.
DyneCorp - Thursday, January 30, 2020 - linkWhat an overly simple and ridiculous paragraph. QLC NAND is a necessary step. 2-MLC NAND is not practical; sticking to 2-bit MLC or 3-bit TLC is a waste of wafers and money to just stifle innovation because you want "MOAR ENDURANCE!". Even "planar" 2D TLC NAND was more than acceptable for 99 percent of consumers. 3D TLC NAND is practical short term for certain workloads, but it's not practical to continue forever, even after hitting 72+ layers.
With so much competition in the market, what do you expect companies to do? Just stop adding bits? Just keep selling expensive NAND that consumers will never exhaust and continue the e-waste and then lose the competitive edge because they refused to innovate? Do you even understand how this works?
64-layer QLC NAND is more than adequate for the majority of consumers, period. Even the Intel 660p with a TBW of 150 would take the average gamer/ PC user 10+ years to exhaust. What's comical about all of this is that hard disks have never come with endurance ratings and naturally have far higher failure rates.