Posts Tagged ‘storage reliability’

Storage managers getting wise to prevailing SSD limitations

Friday, April 17th, 2009

The industry is catching on to what I’ve been talking about for some time: flash technology offers tremendous value for the enterprise, yet adoption hinges on addressing the prevailing limitations of existing SSDs first.

This ‘revelation’ appeared in a SearchStorage.com article by Beth Pariseau, “Storage admins mull SSDs at SNW.”  The article quotes multiple storage administrators who all basically believe in the benefits of SSD, but stop short of saying that the technology is ready for prime time.

Here are their top concerns: predictable performance, data integrity, the lack of consistent, industry-accepted SSD benchmarks, and cost.

Let’s quickly look at each of these:

  1. Predictable performance – I covered this recently in my “’Predictable performance’ for changing business dynamics” post. This area has traditionally been a challenge for SSDs in enterprise applications because workloads are random and indeterminate. Predictability requires consistent performance, independent of whether reading or writing data, because enterprise applications typically vary the read-to-write ratio between 60/40 and 90/10. Enterprise SSDs should be able to maintain performance across this range.
  2. Data integrity – I couldn’t agree more that data integrity features are critical if flash technology is to perform at enterprise levels, and the Data Integrity Field (DIF) standard is an important step in this direction. Yet, today so few storage devices support the DIF standard. Pliant began mapping toward the DIF standard early on, recognizing how important it was for enterprise-class storage systems.
  3. Standardized benchmarks – In my post, “SSD jargon and the need for standards,” I listed a number of pivotal questions that must be addressed if the industry is ever to develop more accurate, relevant – and yes, consistent – SSD benchmarks. These include making sure that real performance is measured and that product lifecycle benchmarks are based on true, 100% duty cycle operation. If product life metrics are contingent on usage limitations – e.g., based on a maximum number of writes or writes per day due to limited error management capability – then the benchmarks are virtually useless.
  4. Cost – Transaction cost (IOPS per $) is the key SSD metric to consider, not the old HDD industry metric of $/GB. This metric is an irrelevant measure of SSD value as a performance solution, and we expect EFDs (Enterprise Flash Drives) to complement high capacity HDDs to optimize for both $/IOP and $/GB.

With most existing vendors either falling short on a number of these points, or masking the limitations of their devices behind carefully crafted marketing spin, it’s no wonder why some storage admins are still skeptical.

This is why I continue to extol the values of EFDs, a new class of solid state storage devices designed with key enterprise considerations in mind. By definition, EFDs are designed to address all of the above issues.

And, as we prepare to announce availability of our first products shortly, my hope is that our approach will help turn the heads and change the minds of the remaining nay-sayers in the industry.

Amyl Ahola

SSD jargon and the need for standards

Thursday, March 5th, 2009

A recent article by editor Zsolt Kerekes of STORAGEsearch.com entitled, “flash SSD Jargon Explained,” got my attention.  The fact that there is a need to explain the jargon is a reminder that the marketing wizards keep inventing new terms to ‘differentiate’ their products, while confusing most of us and masking issues of real importance to data center operations.   A version of marketing 101: If you have a weakness, flaunt it.

The list of SSD jargon Kerekes cites in the article includes: dynamic leveling, active leveling, static leveling, BCH codes, Reed Solomon codes, write endurance, write amplification, write attenuation, garbage collection, read patrol, wear leveling, read disturb, and program disturb.

Look at the last two.  These are rarely discussed but are among the most important issues to those who care about losing data.  An earlier STORAGEsearch.com article asks the question, “Can you trust your flash SSD specs & Benchmarks?”  The answer can only be ‘of course not!’  At least not until there is some semblance of standardization.  This is especially true when considering using SSDs to meet the performance and reliability demands of enterprise applications.

With this in mind, some questions that should be asked (and answered) about SSD performance and reliability specs and benchmarks are:

1.    What is the real performance?

A simple question but rarely, if ever, addressed in the specifications.  Typical environments are random, 60%-70% read, and 4K/8K blocks.  Not small blocks (512b) to show high IOPs, or large blocks to show high bandwidth.

2.    Is the performance deterministic?

The writing process for flash is inherently slower than reading.  Does the performance drop substantially as a function of the read/write mix or does it stay relatively constant as needed to maintain consistent response times?  Is the performance dependent upon the use of cache (and the associated power loss and recovery issues of volatile cache memory)?

3.    Is the performance sustainable?

What does ‘sustainable’ mean? It is not unusual for performance to degrade as more and more of the device gets written to…it may take minutes or hours, but degradation of 50% or more may occur.

4.    What is the capacity available to the user?

Another simple question, but all SSDs contain more flash than that available for end user data. For example, the additional (or over-provisioned) flash may be used to optimize write performance, provide for spare blocks, CRC codes, ECC codes, and meta data.  Does the stated capacity net this all out?

5.    Are there duty cycle or other limitations on usage in order to achieve/maintain the specifications?

Does the architecture provide for 100% duty cycle, or is the product life contingent on a maximum number of writes or writes per day due to limited error management capability.

Is it assumed there will be ‘adequate’ idle time (what’s that in the enterprise?) to perform the necessary flash management activities?

6.    Are the error management and ECC algorithms powerful enough to correct read disturb and program disturb errors without resulting in excessive rates of uncorrectable errors and/or losing capacity due to bad block mapping?

Error correction approaches which utilize limited ECC to correct random bit failures may not have sufficient correction capability for read/program disturb errors. Correction capabilities may appear adequate but be based on codes, such as the Reed Solomon code, which is great for hard drives but not really applicable to flash failure modes. The lack of idle time for background flash management makes this problematic for many / most SSD architectures.

Kerekes sums it up well: “Better user education about SSDs is a critical factor for the industry to sustain its growth. Design trade offs in products go far deeper than the choice of memory and interface. Being aware that there are other parameters which SSD vendors have implemented well, badly (or not at all) can be the difference between a satisfactory or disillusionary experience.”

What do you think?

Amyl Ahola