Endurance Ratings: How They Are Calculated

One of the questions I quite often face is about the manufacturers' endurance ratings. Go back two or three years and nobody had any endurance limits in their client SSDs but every SSD released in the past year or so has an endurance limitation associated with it. Why did that happen? Let's open up the situation a bit.

A few years ago, many enterprises would just go and buy regular consumer SSDs and use them in their servers. Generally there is nothing wrong with that because there are scenarios where enterprises can get by with client-grade hardware, but the problem was that a share of the enterprises knew that the drives weren't durable enough for their needs. However, they also knew that if they wore out the drive before the warranty ran out, the manufacturer would have to replace it.

Obviously that wasn't very good business for the manufacturers because for one drive sold, more than one had to be given away for free. At the same time less customers were buying the more expensive, high profit enterprise drives. Without disrupting the client market by either increasing prices or reducing quality, the manufacturers decided to start including a maximum endurance rating, which would invalidate the warranty if exceeded.

The equation for endurance is rather simple. All you need to take into account is the capacity of the drive, the P/E cycles of the NAND and the wear leveling and write amplification factors. When all that is put into an equation, it looks like this:

Notice that the correct term for TBW is TeraBytes Written, not TotalBytes Written although both are fairly widely used. The hardest part in calculating the TBW is figuring out the wear leveling and write amplification factors because these are workload depedent. Hence manufacturers often use a worst case 4KB random write scenario to come up with the TBW figure as this ensures that the end-user cannot have a more demanding workload with higher write amplification.

For the uninitiated, the wear leveling factor (WLF) in this context means the maximum stress that the wear leveling method would put onto the most heavily cycled block compared to the average number of cycles. A factor of two would mean that the most heavily cycled block would have twice the number of cycles compared to the average. Write amplification factor (WAF), on the other hand, refers to the ratio of host and NAND writes. A factor of two would in this case mean that for every megabyte that the host writes, two megabytes are written to the NAND. These two factors go hand in hand in the sense that a small WLF results in higher WAF because the drive will do more internal reorganization operations to cycle all blocks equally, which consumes NAND writes.

The interesting part about TBWs is that they actually give us a way to estimate the combined wear leveling and write amplification factor of the drive. In the case of 120GB M500DC, that would be a surprising 0.72x. Obviously you can go lower than 1x without using some form of compression but the 120GB M500DC actually has 192GiB of NAND onboard that extends the endurance. If we used that figure to calculate the combined WLF and WAF, it would be 1.24x, which is much more reasonable. For some reason the JEDEC spec defines the capacity as the usable capacity even for endurance calculations but in the end it doesn't matter what figure you change as they are all related to each other (e.g. with 120GB used as the capacity, the P/E cycles could be higher than 3,000 because the over-provisioned NAND adds cycles).

Ultimately none of the manufacturers are willing to disclose the exact details of how they calculate their endurance ratings but at the high-level this is how it's done according to JEDEC's standards. Furthermore, I wouldn't rule out the possibility that some OEMs artificially lower the ratings for their consumer drives just to make sure they are not used by enterprises. In the end, there isn't really a way for us to find out whether the TBW is accurate or not since the efficiency factors are not easily measurable by third parties like us.

Micron M500DC: Features Performance Consistency
POST A COMMENT

37 Comments

View All Comments

  • apudapus - Tuesday, April 22, 2014 - link

    I don't quite understand your statement in the first part:
    data retention decreases with NAND wear -> consumer drives have higher endurance

    Regarding the last sentence, SSD endurance is measured in number of writes like TBW. NAND endurance is measured in P/E cycles. The endurance of an SSD should not be measured in P/E cycles because erasing is handled internally to the SSD, there is no "erase" command to send to an SSD (trim does not directly yield an erase), write amplification (decreases endurance) and overprovisioning (increases endurance) must be taken into account and is not controlled by the user. Total writes is all that is needed when discussing SSD endurance. With that said, please explain your reasoning for the drive having a higher endurance than 3000 "P/E cycles".
    Reply
  • Solid State Brain - Tuesday, April 22, 2014 - link

    The more P/E cycles your NAND memory goes through, the shorter its data retention time gets.
    Therefore, the shorter the data retention requirement for the intended usage is, the more P/E cycles you can make your memory can go through (or in other words: the more data you can write). Actually it's a bit more complex than that (for example the uncorrectable bit error rate also goes up with wear), but that's pretty much it.
    Reply
  • apudapus - Wednesday, April 23, 2014 - link

    I see. So the assumption is that NAND with shorter data retention requires more refreshing (a.k.a. wasted programs). I believe this to be true for enterprise drives but I would be surprised to see this being done on consumer drives (maybe for TLC, though). Reply
  • valnar - Tuesday, April 22, 2014 - link

    I wish they would find a way to lower the cost of SLC. Look at those endurance numbers. Reply
  • hojnikb - Tuesday, April 22, 2014 - link

    Why would you want SLC anyway ?
    If you need endurance, HE-MLC is plety enough.
    Unless you write like crazy; them probobly buying SLC shouldn't pose a problem :)
    Reply
  • valnar - Tuesday, April 22, 2014 - link

    Because 20nm TLC and crap like that barely holds a "charge", so to speak, when not powered up. That's just way too volatile for my liking. I'm not always running all my PC's every day. Reply
  • bji - Tuesday, April 22, 2014 - link

    What difference does it make if the drive is powered up or not? These are static cells, they are not "refreshed" like DRAM. They are only refreshed when they are rewritten, and if your drive is not doing continuous writes, it's not guaranteed to rewrite any particular cell within any specific timeframe. Reply
  • apudapus - Tuesday, April 22, 2014 - link

    NAND has limited data retention and should be refreshed like DRAM, albeit at a much larger timescale like 1 month (TLC) to a year (I believe 54nm SLC from years ago had this spec near the end of its life, ~100,000 P/E cycles). Good SSDs should be doing this. Reply
  • Kristian Vättö - Wednesday, April 23, 2014 - link

    ALL consumer drives have a minimum data retention of one year, regardless of the type of NAND (SLC, MLC or TLC). This is a standard set by JEDEC. For enterprise drives it's three months. Reply
  • apudapus - Wednesday, April 23, 2014 - link

    That may be the requirement for drives but not for NAND. Drives can do several things to increase data retention: refresh stale data after time, provide strong ECC, do voltage thresholding, etc. I think JEDEC specifies hundreds of hours for NAND retention. Reply

Log in

Don't have an account? Sign up now