Posts Tagged ‘Hard Disk Drives’

Storage reliability for the enterprise

Tuesday, May 20th, 2008

I’ve written a lot about I/O performance on this blog, and with good reason.  When I discuss Pliant’s EFD device and enterprise IT system performance issues with partners and the press, one of the questions that almost always comes up is about performance.  But, I often point out that, just like when considering a sports car, performance is only part of the equation.  Reliability is of equal importance as well.

Enterprise storage applications are demanding, and it is essential that reliability specifications are met at a 100-percent duty cycle operation on a 24/7/365 basis.  Those in the industry know that true enterprise-class disk drives are required for this environment, and that disk drives designed for low cost and low duty cycle laptop/desktop applications literally fall apart when employed in an enterprise application.  Likewise, SSDs designed for laptop/desktop applications also do not even come close to meeting the need.  So, for Enterprise Flash Drives to be accepted in the enterprise they must meet or exceed enterprise class HDD reliability. This is not a trivial task. 

The primary enterprise reliability specifications take the form of MTBF (or more meaningfully: annualized failure rate) and non-recoverable error rates (lost data).  Flash technology has three primary failure phenomenon that have a significant impact on reliability:

  • Write Endurance – the limit on how many times a cell can be written/erased before it becomes damaged
  • Write/Program Disturb –  writing to a given page in a Flash chip can alter bit(s) in a page that is not being written (does not damage the cell); this is sometimes referred to as “bit flip”
  • Read Disturb – similar to Write Disturb, reading a page in a Flash chip can alter bit(s) in a page not being read (does not damage the cell)

A further complication is that these failure modes are not independent.  For example, the read disturb error rate is related to the number of writes or erases so that write endurance and read disturbs (and write disturbs) must be holistically considered.  It is obvious that they all contribute to non-recoverable errors, but perhaps not as obvious that they contribute to MTBF as well.  MTBF is a measurement of performance to specification, not just to some catastrophic event, as is typical with a disk drive.  This includes meeting performance and capacity specifications.

A common approach used in typical SSDs to deal with write endurance is to incorporate a wear-leveling algorithm to distribute writes across blocks within the chip(s), together with error correction (ECC), so that any damaged cells can be corrected when read.  This same ECC can then be applied for all reads to detect and correct altered bits (‘bit flips’) independently of how they became defective, i.e., write endurance, read disturb, or write disturb.  If the number of defective bits exceeds the ECC threshold, the sector(s) being read would then have to be marked as defective (non-recoverable error) and made unavailable to the system.  Depending on the amount of spare Flash capacity, at some point the resulting system capacity may well drop below the specification.

As an example, a well-known supplier of SSDs advertises an ECC that corrects up to 8 bytes in 1024 bytes, while another supplier advertises 6 bytes in 528 bytes.  At the same time, both talk about program erase/write cycles well in excess of 1 million.  However, tests show that both ECC levels would frequently result in non-recoverable errors after as few as 200,000 write/erase cycles.  These error rates result in SSD reliability falling far short of disk drive reliability in terms of non-recoverable error rate.  At the same time, overall capacity begins to erode and eventually falls below the device specification, resulting in an MTBF failure. 

And, that’s not all.  There is also a significant performance impact resulting from the management of these high error rates (It drops dramatically!).

The primary point is that enterprise-level reliability, whether it’s MTBF or non-recoverable error rate, can not be addressed with just traditional ECC.  Other techniques must be employed in addition to ECC to manage errors.  In addition, these additional techniques cannot be allowed to significantly impact performance (IOPs or bandwidth).

Sounds like a daunting task…or is it??  Stay tuned.

 Amyl Ahola

Hard disk is free…hardly!

Wednesday, March 26th, 2008

The dramatic reductions in HDD cost per GB have resulted in many system/storage architects (and application/operating system programmers) treating primary storage as though it is free.

Some of the results are:

  • Exponential increases in the size of operating systems and applications
  • Mass deployment of low-end and midrange servers with multiple copies of data (and applications)
  • Over-provisioning of storage to satisfy future needs projections (which also likely adopt the concept of free storage)
  • Adoption of power-hungry DRAM cache appliances to mask HDD performance shortfalls
  • Over-provisioning of HDDs to mask HDD performance shortfalls

These all result in inefficient use of storage that has many costs, not the least of which is the increasing cost of energy consumption.  Some of the energy data becoming available paints a sobering picture:

  • Data centers account for 1.5% of ALL U.S. electrical consumption, and this is expected to double in a few years
  • Power consumption per $1,000 of server spending has increased by a factor of 4 since 2000
  • Power failure and availability is expected to halt data center operations at more than 90% of all companies over the next few years
  • Fifty percent of current data centers will have insufficient power and cooling capacity this year

HDDs are clearly not the only contributor to the rapid acceleration of data center power consumption, but their inefficient use is likely one of the largest contributors.  Data that suggests more than one third of data center power consumption is storage related.

Trends and techniques such as consolidation, virtualization and thin provisioning should all contribute to improved efficiencies.  But while doing so, these approaches will put increased performance demands on the HDDs.  The result:  an increased need for higher performance (i.e., higher RPM……read that as ‘power consuming’) drives and even further over-provisioning for performance – and therefore once again increased energy consumption.

It’s time for new metrics to be considered in the data centers, which take into account energy usage to aid the system designers as they optimize their systems.  Several metrics are identified at the www.greendatastorage.com website; examples cited include activity per watt, such as transactions/Watt, IOPs/Watt, and bandwidth/Watt.

I believe that Enterprise Flash Drives (EFDs) will play a major role in reversing these trends. EFDs can provide over 1000x improvement in IOPs/Watt, and an order of magnitude or more improvement in bandwidth/Watt over the highest performing HDD’s.

Amyl Ahola

Point made

Monday, March 3rd, 2008

I have decided to depart from my planned comments because I can’t help but mention a recent storage announcement to help make the point of how far one can go to try and overcome HDD shortfalls.  The announcement references an array of small form factor drives (2 ½”) configured to achieve a very high ‘Actuator Density’ array — a noble objective.  According to the company’s white paper, the unit is sealed and contains 160 or more drives in a 3RU rack.  The drives apparently counter rotate and are offset from each other to overcome shock and vibration issues.

The white paper also talks about 160 drives plus 10 spares in this sealed unit with a three-year minimum life, resulting from the ‘Failure in Place’ ability to dynamically swap failed drives with self-contained spare drives.  Using their own MTBF data and typical failure statistics, this configuration would result in more than 10 failures in a three-year period in more than 50 percent of the installations.  Putting this in perspective:  more than 10 failures means replacing the entire sealed unit, and in the best case, requiring many hours recovering tens-of-TBs of data.  I would not want their warranty bill!

Given the actual failure rate of HDDs (especially consumer grade HDDs used in high duty cycle environments), and not the inflated specification numbers from HDD suppliers, I can’t imagine anyone who has experienced a catastrophic HDD failure event willing to take the risk on such a ‘sealed’ configuration that doesn’t allow hot swaps.  This seems to be marketing 101 at its best:  if you have a downside, feature it!

The white paper also propositions improvements in performance and power that, at least in this author’s opinion, when subjected to similar tests of objective analysis do not hold up.

My point is not to throw stones at this particular approach, but rather use it to illustrate the magnitude of the challenge in overcoming the inherent performance and reliability shortcomings of HDDs.  When the fundamental problem is the mechanical nature of the beast, the solution is not to keep adding more of the same. 

In one sense, this is not all that different from the political discussions of the day…the question boils down to, do you want more of the same or is it time for a new paradigm?

Amyl Ahola

The storage industry has come a long way, but… (Part 2)

Monday, February 25th, 2008

As I thought more about the topic of my last post – how far the storage industry has come since its inception – another point occurred to me.  While disk drive capacity and cost achievements have been incredible, orders of magnitude improvement, disk drive performance gains are unremarkable – especially when you compare them to the significant advances in CPU and network performance.

Now, I don’t want to receive an inbox full of angry emails (angry comments are welcomed!) about this, so let me make it clear that I truly appreciate the technological challenges and the progress that has been made towards reducing disk latency and positioning times.  But, at the end of the day performance improvement is less than 40x in nearly 50 years!  This compares to multiple orders of magnitude improvement in CPU performance during the same period.

Amdahl’s (other) Law requires that I/O performance improve at the same rate as CPU performance to maintain balanced system performance.  However, with the lag in disk drive performance I/O over the years, what we have now is a growing gap that system designers have had to cope with in their attempt to balance system performance.   The result:  the birth of new industries so that the system designers can add additional hardware – such as cache and RAID together with short-stroking and over-provisioning the disk drives – in an attempt to overcome the performance and reliability shortfalls due to the mechanical nature of HDD’s.

While these approaches do improve performance to some degree, they also carry a significant cost to customers.  This is due not only to the cost of the additional hardware and software but increased system complexity, increased power consumption, reduced reliability, increased floor space, increased maintenance expense, and on and on.  What is the true cost of HDD performance….it is anybody’s guess, but I’d argue that it is far greater than what is generally believed!

I believe that this is the most important data storage issue that needs to be addressed.   In particular, how can the industry solve the I/O performance problem without even more patches (e.g., more cache) and ever increasing over provisioning? 

I have some thoughts that I’ll share next time.  In the meantime, I’d love to hear from you.

Amyl Ahola