Posts Tagged ‘I/O performance’

Scalable Performance, Part II – Managing Response Times

Monday, June 21st, 2010

Previously I wrote about the importance of “scalable performance” when it comes enterprise storage. My point: Enterprise-class SSDs must have sufficient back-end horsepower to scale I/O performance to meet increasing workloads for today’s data-throughput intensive applications.

Pliant recently ran a series of performance tests on several storage devices, and the results were eye opening. For baseline reference, see what happens to a typical enterprise 15K RPM hard drive as the workload increases:

Once the HDD reaches 400 IOPS, the response time starts to increase considerably to more than 75,000 microseconds (μs). And, as it reaches 500 IOPS, the response time nearly doubles, reaching more than 137,000 μs. This is the expected behavior for a mechanical device, limited by a single actuator.

Generally speaking, SSDs offer better raw I/O performance than HDDs, but when it comes to scalability and managing response time, the results are startling:

The chart above compares the STEC ZeusIOPS SAS SSD against the Pliant Lightning™ LS 300S EFD using the same saturation plot. As the workload increased, the STEC SSD’s response time increased dramatically, reaching as high as >10,000 microseconds, without significant scaling of I/O bandwidth. In contrast, the Pliant LS 300S was able keep response time below 2,000 microseconds and continued to scale I/O bandwidth to more than 32,000 IOPS.

This is the reason we designed Pliant’s Lightning Enterprise Flash Drive (EFD) to provide steady, predictable performance over time regardless of workload:

Data center I/O demands are dynamic and unpredictable by nature. As such, enterprise storage devices must have the power and flexibility to scale on-the-fly to provide a high level of performance at all times and under all workloads.

C.T. Chu

Gartner Cool Vendors 2010 Webinar on May 20

Monday, May 10th, 2010

IT analyst firm Gartner recently issued its annual “Cool Vendors in Storage” report, showcasing a handful of innovative companies that are providing IT organizations with technologies to dramatically increase efficiencies and deliver measurable bottom-line benefits.

As a follow up to the report, Gartner Fellow Daryl Plummer is hosting a complimentary Webinar on May 20 – Cool Vendors 2010: Staying Cool in Economic Heat – to review the 2010 Cool Vendors, and discuss specifically how each company is poised to positively impact the IT market.

We were thrilled that Plaint was recognized among this elite group of innovators for our Enterprise Flash Drives, and I encourage you to check out the free Webinar.

The Webinar will be held twice on Thurs., May 20, and you can register online at the following links:

Register for 9 a.m. EDT/6 a.m. PDT session – https://www1.gotomeeting.com/register/933461200
Register for Noon EDT / 9 a.m. PDT session – https://www1.gotomeeting.com/register/530216081

Hope you enjoy!

Greg




“Predictable performance” is good. “Scalable performance” is even better!

Monday, November 30th, 2009

We’ve been participating in Bell Micro’s SSD Seminar Series to help educate enterprise IT managers, OEMs, and storage and IT system developers on the significant performance, reliability and cost advantages of next-generation solid-state storage technology.

The seminars have been held in cities across North America (Toronto, Montreal, Boston, Bethesda, and Minneapolis), and the final one is scheduled for this Thursday, Dec. 3, in Milpitas, CA. Four SSD suppliers, including Pliant, will be presenting on the multiple benefits of adding SSDs to the storage infrastructure.

Not surprisingly, there’s quite a bit of expectation and discussion at the seminars that SSDs can deliver significantly higher I/O performance than hard drives. However, what’s surprising to me, having done five of the six seminars, is that people are already talking about “performance droop in SSD over time.” One of the vendors even stated, “fresh-out-of-box performance is different than steady-state performance.”

Now, I will spare you the discussion on how garbage collection affects SSD performance over time, as there are already plenty of articles on this subject. What is most troublesome is that performance droop should not occur at all. Solid state drives are supposed to alleviate performance bottlenecks, not introduce new ones. As such, a properly designed SSD controller should and must have sufficient horsepower so that a critical function like garbage collection will not impact I/O performance.

While we’re on the subject of performance, let’s talk about scalable performance.

Advanced interfaces, such as Fibre Channel (FC) and Serial Attached SCSI (SAS), provide access to two ports. For years, the secondary port has been relegated to sitting idle, used only when the primary port failed.

That’s a shame…and a waste, in my opinion.

Both the FC and SAS interfaces allow performance to scale when both ports are actively used. Given the fact that solid state drives are not physically constrained by a single read/write head, one should expect to scale performance by reading and writing to both ports at the same time!

The below charts illustrate the point:

Scalable Performance Comparison Chart

You paid for both ports already. Why not actually use both?

If you would like to learn more about “scalable performance” and how you can best implement enterprise SSD storage in your IT infrastructure, I invite you to join me and the Bell Micro team at the seminar this Thursday. It will be held at the Crowne Plaza Hotel in Milpitas, CA.

You can find more detailed info here:  http://www.bellmicro.com/ssd/seminar.asp.

I look forward to seeing you there!

C.T. Chu

You paid for both ports already. Why not actually use both?

If you would like to learn more about “scalable performance” and how you can best implement enterprise SSD storage in your IT infrastructure, I invite you to join me and the Bell Micro team at the seminar this Thursday. It will be held at the Crowne Plaza Hotel in Milpitas, CA.

You can find more detailed info here: http://www.bellmicro.com/ssd/seminar.asp.

I look forward to seeing you there!

C.T. Chu

SSD jargon and the need for standards

Thursday, March 5th, 2009

A recent article by editor Zsolt Kerekes of STORAGEsearch.com entitled, “flash SSD Jargon Explained,” got my attention.  The fact that there is a need to explain the jargon is a reminder that the marketing wizards keep inventing new terms to ‘differentiate’ their products, while confusing most of us and masking issues of real importance to data center operations.   A version of marketing 101: If you have a weakness, flaunt it.

The list of SSD jargon Kerekes cites in the article includes: dynamic leveling, active leveling, static leveling, BCH codes, Reed Solomon codes, write endurance, write amplification, write attenuation, garbage collection, read patrol, wear leveling, read disturb, and program disturb.

Look at the last two.  These are rarely discussed but are among the most important issues to those who care about losing data.  An earlier STORAGEsearch.com article asks the question, “Can you trust your flash SSD specs & Benchmarks?”  The answer can only be ‘of course not!’  At least not until there is some semblance of standardization.  This is especially true when considering using SSDs to meet the performance and reliability demands of enterprise applications.

With this in mind, some questions that should be asked (and answered) about SSD performance and reliability specs and benchmarks are:

1.    What is the real performance?

A simple question but rarely, if ever, addressed in the specifications.  Typical environments are random, 60%-70% read, and 4K/8K blocks.  Not small blocks (512b) to show high IOPs, or large blocks to show high bandwidth.

2.    Is the performance deterministic?

The writing process for flash is inherently slower than reading.  Does the performance drop substantially as a function of the read/write mix or does it stay relatively constant as needed to maintain consistent response times?  Is the performance dependent upon the use of cache (and the associated power loss and recovery issues of volatile cache memory)?

3.    Is the performance sustainable?

What does ‘sustainable’ mean? It is not unusual for performance to degrade as more and more of the device gets written to…it may take minutes or hours, but degradation of 50% or more may occur.

4.    What is the capacity available to the user?

Another simple question, but all SSDs contain more flash than that available for end user data. For example, the additional (or over-provisioned) flash may be used to optimize write performance, provide for spare blocks, CRC codes, ECC codes, and meta data.  Does the stated capacity net this all out?

5.    Are there duty cycle or other limitations on usage in order to achieve/maintain the specifications?

Does the architecture provide for 100% duty cycle, or is the product life contingent on a maximum number of writes or writes per day due to limited error management capability.

Is it assumed there will be ‘adequate’ idle time (what’s that in the enterprise?) to perform the necessary flash management activities?

6.    Are the error management and ECC algorithms powerful enough to correct read disturb and program disturb errors without resulting in excessive rates of uncorrectable errors and/or losing capacity due to bad block mapping?

Error correction approaches which utilize limited ECC to correct random bit failures may not have sufficient correction capability for read/program disturb errors. Correction capabilities may appear adequate but be based on codes, such as the Reed Solomon code, which is great for hard drives but not really applicable to flash failure modes. The lack of idle time for background flash management makes this problematic for many / most SSD architectures.

Kerekes sums it up well: “Better user education about SSDs is a critical factor for the industry to sustain its growth. Design trade offs in products go far deeper than the choice of memory and interface. Being aware that there are other parameters which SSD vendors have implemented well, badly (or not at all) can be the difference between a satisfactory or disillusionary experience.”

What do you think?

Amyl Ahola

Mark Peters (ESG) Extols Value of EFDs for Data Centers

Tuesday, December 23rd, 2008

Now, here’s someone who really understands the benefits and value of using Enterprise Flash Drives (EFDs) in enterprise IT data centers:  Mark Peters.

Mark covers data center storage and systems for Enterprise Strategy Group.  He was recently interviewed for a SearchStorage.com “FAQ Guide” podcast about the growth in enterprise solid state technology.  (Read the full transcript here)

In the interview, Mark addresses the questions he hears most often from storage administrators about solid state technology, and I have to say that his views are spot-on — particularly regarding the benefits and value of solid state, and the market/business drivers that are making the technology increasingly attractive.

A few of the key points Mark makes are:

1)  I/O performance benefits

“Generically, whatever is most important to a business or enterprise or organization in terms of getting throughput and I/O handled, wherever you need speed, wherever you need a great deal of performance in terms of throughput, then solid state will be great.”

2)  Energy efficiency

“Given that we’re in such challenging economic times, that makes solid state more interesting.  Obviously with my focus on the data center I look at the green aspect of computing as well, and it’s hard to overlook solid state from that perspective.”

3)  Cost-efficiency

“Even in terms of today’s pricing, cost per I/O or the I/O per watt for solid state are already very compelling.”

It’s nice to see Mark (and other industry experts) start to recognize the important and growing role EFDs will play in the future.

Amyl Ahola

PS.  Mark also has a blog with more great info on a variety data center storage issues:  Mark My Words.  I suggest checking it out if you haven’t already.

The 2009 Enterprise IT Storage Model: Performance + Efficiency

Wednesday, December 10th, 2008

You don’t need a crystal ball to predict how the global economic slowdown and a prolonged recession will impact IT spending in 2009:  it’s going to be ugly.  Many projects will be delayed, eliminated outright, or at the very least, cut severely in scope. 

This poses a huge problem for enterprise IT managers. Why? 

Quite simply, enterprise information demands continue to increase with no end in sight.  And, data center managers will have to do anything and everything in their power — without making significant new IT capital investments —to keep up with the increasing IT system performance demands. 

Failure to do so will be unacceptable, so what are the options?

Two things come to mind:  1) optimizing existing IT systems for increased performance; and 2) significantly reducing the energy consumption of power-hungry high RPM hard disk racks.  Is this difficult? 

It may be easier than one thinks and requires no change to the existing infrastructure, management software or systems.  By adding Enterprise Flash Drives (EFDs) to handle the performance workload of many spinning hard drives, both goals can be achieved.  The high performance of the EFD enables more I/Operformance and flexibility to meet peak periods and growing demands.  By combining EFDs with high capacity HDDs, today’s storage racks can be reduced to storage shelves saving power (up to 80%), space and money. 

I predict that beginning in 2009, EFDs will be a key tool for enterprise IT managers to survive the economic turmoil while optimizing their existing storage systems.

And, let’s face it, it’s time for a change to the traditional approach to high-performance storage solutions. 

Interested to hear your feedback, so please feel free to comment.

Amyl Ahola

“Predictable performance” for changing business dynamics

Wednesday, November 5th, 2008

In a previous blog, I suggested that performance, reliability, IOPS per watt, and IOPS per $ are key storage metrics for enterprises. However, satisfying demanding enterprise needs goes far beyond the attainment of just these metrics. I/O-intensive enterprise IT applications require IOPS and bandwidth levels to be predictable and sustainable across a variety of workload requirements.

Predictable performance has traditionally been a challenge for SSDs in enterprise applications because workloads are random and indeterminate. This means that predictability requires consistent performance, independent of whether reading or writing data, as enterprise applications typically vary the read-to-write ratio between 60/40 and 90/10.  Ensuring that predictable performance is maintained while the workload changes is another example of how an Enterprise Flash Drive (EFD) offers differentiation from traditional SSDs. 

A performance comparison (IOmeter-based) between a well-publicized ‘enterprise’ SSD and the new Pliant EFD illustrates this difference.  From the chart, you can see how the ‘enterprise’ SSD(I) performance drops by over 80% as the read/write ratio changes. The Pliant EFD maintains its performance across the range from 100% reads to a 50/50 read/write ratio. This is because the Pliant EFD can read and write simultaneously to the drive and therefore offer substantially better and predictable performance for these demanding applications. Traditional SSDs and HDDs can only perform one read or write at a time. 

The bottom line: EFDs enable enterprises to achieve higher I/O performance, maintain performance predictability with changing workloads, offer higher levels of service quality, and dynamically address changing business requirements without adding additional hardware.   

I’m curious to hear what you think, so please feel free to comment.

Amyl

 

The Changing Enterprise Storage Landscape

Wednesday, October 22nd, 2008

It’s clear to many industry experts that the enterprise storage landscape is changing dramatically.  And, as I’ve said, soon just about every enterprise data center in the world will be using enterprise flash drives (EFDs) for at least a portion of their data storage needs due to the accelerated requirements for higher levels of I/O performance, as well as the growing pressure to cut energy costs.

I was recently published in Systems Management News, so check out the article for greater detail.Click to link here:  http://www.sysmannews.com/link/32853

I’m curious to hear what you think, so feel free to comment.

Amyl Ahola

Enterprise Flash Drives: A definition

Monday, July 14th, 2008

I have written about a new class of SSDs referred to as Enterprise Flash Drives (EFDs) many times.  But what does it take to make a true “enterprise-class” SSD drive?  With so many different SSDs targeted for the enterprise it can be difficult to tell which SSDs really qualify as EFDs, and which do not. 

So, I think a description and definition is in order. 

In the world of disk drives, enterprise-class products are distinguished from desktop and laptop products by their ability to provide superior performance and reliability.  This means that they are expected to perform flawlessly in mission critical environments.  This same requirement also holds true for enterprise SSD devices.  However, just like lower-end disk drives, SSDs designed for laptops and desktops simply can’t pass muster when expected to provide the performance and reliability required in a mission-critical enterprise environment.  There are a number of existing SSD products marketed for the enterprise, many of which are nothing more than re-packaged consumer grade (laptop) SSD technology.  In fact, many of the so-called “enterprise SSD” drives actually underperform HDDs in laptop applications…hardly what I would call enterprise class. 

Therefore, a true EFD must provide high levels of performance and reliability for flawless operation in mission critical, I/O-intensive environments.  Given the growing power and space concerns of today’s large enterprise environments, reduced energy consumption is becoming an equally important criterion for any new class of primary storage devices.  An EFD’s superior performance, energy efficiency and improved reliability allow data centers to substantially grow capacity and performance in existing installations while reducing energy needs and TCO.

Given these requirements, an Enterprise Flash Drive should, at a minimum, provide the following:

  1. Superior I/O Performance – Adequate I/O performance levels to prevent bottlenecks, even during peak activity periods (generally 3-5 times greater than typical activity periods), without requiring extra hardware (i.e., cache)  while providing ample scalability for growth.  At a minimum, an EFD should deliver at least 100,000 random IOPS or more and be able to sustain this rate for typical block sizes (4K bytes or more). 
  2. Exceptional Reliability – EFDs need to deliver significantly lower failure rates than disk drives, given the inherent benefit of solid state technology (no moving parts).  Performance and reliability must be predictable and sustainable at 100 percent duty cycles (24/7/365) without cycle-stealing maintenance or “housekeeping” actions.  Lifetime should exceed five years without performance or capacity degradation.  Robust reliability monitoring and reporting capabilities are essential.
  3. Energy Efficiency – EFDs should meet new standards for green data center excellence of greater than 20,000 IOPS per Watt, with activity-based power management to limit energy consumption when the device is less than 100 percent utilized.
  4. Cost Efficiency – Transaction costs ($/IOPS) must be substantially reduced from that of an HDD (<10%).  And, it goes without saying that an EFD must be form factor and interface compatible with HDDs (while providing similar storage capacities).

While these requirements are very demanding, I believe they only begin to define the needs and ability of solid state technology to transform future system and storage architectures.  In my opinion, the vast majority of today’s SSD products are already falling short of the true needs. 

Interested to hear what you think…

Amyl Ahola

Storage reliability for the enterprise

Tuesday, May 20th, 2008

I’ve written a lot about I/O performance on this blog, and with good reason.  When I discuss Pliant’s EFD device and enterprise IT system performance issues with partners and the press, one of the questions that almost always comes up is about performance.  But, I often point out that, just like when considering a sports car, performance is only part of the equation.  Reliability is of equal importance as well.

Enterprise storage applications are demanding, and it is essential that reliability specifications are met at a 100-percent duty cycle operation on a 24/7/365 basis.  Those in the industry know that true enterprise-class disk drives are required for this environment, and that disk drives designed for low cost and low duty cycle laptop/desktop applications literally fall apart when employed in an enterprise application.  Likewise, SSDs designed for laptop/desktop applications also do not even come close to meeting the need.  So, for Enterprise Flash Drives to be accepted in the enterprise they must meet or exceed enterprise class HDD reliability. This is not a trivial task. 

The primary enterprise reliability specifications take the form of MTBF (or more meaningfully: annualized failure rate) and non-recoverable error rates (lost data).  Flash technology has three primary failure phenomenon that have a significant impact on reliability:

  • Write Endurance – the limit on how many times a cell can be written/erased before it becomes damaged
  • Write/Program Disturb –  writing to a given page in a Flash chip can alter bit(s) in a page that is not being written (does not damage the cell); this is sometimes referred to as “bit flip”
  • Read Disturb – similar to Write Disturb, reading a page in a Flash chip can alter bit(s) in a page not being read (does not damage the cell)

A further complication is that these failure modes are not independent.  For example, the read disturb error rate is related to the number of writes or erases so that write endurance and read disturbs (and write disturbs) must be holistically considered.  It is obvious that they all contribute to non-recoverable errors, but perhaps not as obvious that they contribute to MTBF as well.  MTBF is a measurement of performance to specification, not just to some catastrophic event, as is typical with a disk drive.  This includes meeting performance and capacity specifications.

A common approach used in typical SSDs to deal with write endurance is to incorporate a wear-leveling algorithm to distribute writes across blocks within the chip(s), together with error correction (ECC), so that any damaged cells can be corrected when read.  This same ECC can then be applied for all reads to detect and correct altered bits (‘bit flips’) independently of how they became defective, i.e., write endurance, read disturb, or write disturb.  If the number of defective bits exceeds the ECC threshold, the sector(s) being read would then have to be marked as defective (non-recoverable error) and made unavailable to the system.  Depending on the amount of spare Flash capacity, at some point the resulting system capacity may well drop below the specification.

As an example, a well-known supplier of SSDs advertises an ECC that corrects up to 8 bytes in 1024 bytes, while another supplier advertises 6 bytes in 528 bytes.  At the same time, both talk about program erase/write cycles well in excess of 1 million.  However, tests show that both ECC levels would frequently result in non-recoverable errors after as few as 200,000 write/erase cycles.  These error rates result in SSD reliability falling far short of disk drive reliability in terms of non-recoverable error rate.  At the same time, overall capacity begins to erode and eventually falls below the device specification, resulting in an MTBF failure. 

And, that’s not all.  There is also a significant performance impact resulting from the management of these high error rates (It drops dramatically!).

The primary point is that enterprise-level reliability, whether it’s MTBF or non-recoverable error rate, can not be addressed with just traditional ECC.  Other techniques must be employed in addition to ECC to manage errors.  In addition, these additional techniques cannot be allowed to significantly impact performance (IOPs or bandwidth).

Sounds like a daunting task…or is it??  Stay tuned.

 Amyl Ahola