Thursday, March 11, 2010

"SSD vs. HDD when fixing performance problems

NavigateStorage recently read George's article regarding high performance problems and HDD vs. SSD, I found it quite informative and wanted to share it with you.

George Crump, Contributor
02.16.2010

As SearchStorage.com's recent Storage Priorities survey shows, data storage managers are making moves toward solid-state storage and solid-state drives (SSDs), with 14% of 360 survey respondents planning to implement them this year and nearly 40% planning to evaluate them this year (in addition to the 7% who already have them in place). Those numbers mean that right now, many of your customers could use help in comparing SSD vs. HDD and determining what value they'd get from implementing SSD to fix performance problems. This is a role that's tailor-made for an integrator and represents an excellent value-add opportunity for you.
While SSD represents a premium in storage capacity, it's well worth it if it improves storage response time to users and critical applications. According to Jim Handy, SSD analyst at Los Gatos, Calif.-based market research firm Objective Analysis, who was interviewed for a story on SearchStorage.com, SSD is 20 times more costly than hard disk drives (HDDs) on a cost-per-gigabyte basis.

But, the same story points out, comparing SSD vs. HDD in terms of cost per input/output operations per second (IOPS), the equation looks decidedly different. According to Mark Teter, chief technology officer (CTO) at Advanced Systems Group, "flash is almost 140 times faster than the fastest HDD." For companies whose revenue is tied to a high I/O on important applications, not implementing SSD could actually cost them money in a clearly measurable way. Beyond that, SSD vendors have plenty of examples of how their technologies have helped with customer retention and increased productivity.
That said, don't assume that SSD is a bullet-proof answer. When a customer comes to you with a performance concern, resist the urge to sell them a whole new storage system -- and, instead, put on your trusted advisor hat. As we discuss in our Visualizing SSD Readiness guide, there are specific steps you can take to determine whether your customer can benefit from the addition of SSD.
The first and probably simplest step, when checking application performance issues, is to test the CPU utilization of the HDD that the application is running on. If the CPU utilization is on average greater than 50%, more than likely the application is CPU-bound rather than storage I/O-bound. But if the CPU utilization rate is below 50%, the CPU is most likely waiting on something. (The lower the rate is, the more likely this is true.) Oftentimes, that's an indication of a storage performance bottleneck.

If your customer's CPU utilization rate is below 50%, the next step is to determine where the performance problems are coming from. In this situation, a utility like PerfMon or third-party tools can help you figure this out.
Using one of these tools, the first parameter to look at is something called queue depth. The queue, as it relates to storage, is essentially the number of pending I/O requests to the storage system. The queue depth is affected by the number of drives in a RAID configuration; the more drives there are in the RAID configuration, the more I/O requests can be handled at the same time. If a RAID configuration consistently shows a queue depth, you can reduce that depth by approximately a unit of one with each drive that you add. As long as there is queue depth, adding drives should improve performance.
But, adding drives to reduce queue depth can lead to a few problems. The first is that the drive count can become quite high and, of course, quite expensive. If adding a few extra drives to the RAID group can return performance to an acceptable level, then this can be a cost-effective way to solve the performance problem. However, if it takes 50 or 100 additional drives to reduce queue depth to an acceptable level, it could break your customer's budget.
Secondly, there is also a concern around capacity waste. In most cases, the application does not need all these drives to store the data. It needs them to generate performance. As a result, terabytes of capacity may go unused, further reducing the ROI of a disk-based solution. (Comparatively, in most cases, one SSD can outperform even a high number of disks, do so less expensively and without wasting terabytes of capacity.)
The third problem with adding HDDs to reduce queue depth is that it might not solve the storage bottleneck problem. If queue depth is zero but storage is still the bottleneck, then you have a latency issue. In that case, the only option is a faster drive.

Utilities like PerfMon and others will help you identify problems with disk latency, which refers to the time it takes for a drive to find the requested data and send that data back out the I/O channel. When measuring response time, to identify problems, look for anything that is averaging consistently above 5 to 10 milliseconds per second.
On mechanical HDDs with disk latency problems, there are two options. The first is to offer your customers faster drives, but for the last 10-plus years we have been stuck at 15,000-rpm drives, so while your customer may see an improvement with this approach they likely have already tried that as a solution. The alternative to a faster drive is to make the response time of the drive faster by making sure the data is on the outer edge of the disk's platter, a technique called short stroking. This essentially means formatting only the outer part of the drive. While this does increase performance, it also wastes a lot of disk capacity -- as much as two-thirds of the total capacity of the drive.
If your customer doesn't yet have 15,000-rpm HDDs, buying them and then short-stroking them is going to be an expensive option. With SSD, on the other hand, response time is often all but eliminated and done so at a significantly lower cost.

The challenge with SSD is to know what to look for. The simple test is to know what processor utilization is on the servers that are hosting the application of file servers that are serving the data. If it is low, it's time to break out utilities to look at the queue depth and response time parameters. If queue depth or response time is high, more than likely SSD will make a dramatic improvement to the customer's environment.

About the author

George Crump is president and founder of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. With 25 years of experience designing storage solutions for data centers across the United States, he has seen the birth of such technologies as RAID, NAS and SAN. Prior to founding Storage Switzerland, George was chief technology officer at one of the nation's largest storage integrators, where he was in charge of technology testing, integration and product selection. Find Storage Switzerland's disclosure statement here.


NavigateStorage offers a full line of storage from many vendors. Contact us at 978-318-9000 or write us. You can read the full article here.

No comments: