When applications get bogged down, all eyes typically focus on the storage; but maybe we should take another look at the application itself.
At an IBM conference a while back, I participated in a panel discussion and one of the questions tossed my way was one I seem to get all the time: “What do I need to do to my storage infrastructure to make my applications perform faster?”
It seems like everyone points at the storage infrastructure to find the culprit for slow performing applications, which is logical given all the money EMC has spent over the years to lock in the notion that storage is where information lives. But I’m finding that storage itself is rarely the source of the problem.
Yes, there are ways to speed up IOPS on a storage rig. As mentioned in last month’s column, one approach to expediting storage responsiveness is to use a variant of sub-LUN tiering that leverages flash solid-state storage, or memory generally, to service data requests. When data is written to a hard disk, and then exposed to frequent and/or concurrent requests for retrieval, temporarily copying that data into silicon and servicing requests from that source can make things faster. XIO does this on its rigs using a patented approach called “hot sheets,” which I’m told refers to the engineer who came up with the scheme and thought he was “hot sheet.”
As for expediting writes, storage solutioneering gives us a couple of options: parallelization or spoofing. The first one, parallelization, is commonly called “short stroking” and involves allocating a bunch of spindles to the task of writing data. Getting more read/write heads involved in the write process can increase overall write performance, but it does so at a cost in terms of power (more disks, more BTUs and more watts) and space (massive arrays consume a lot of raised floor).
Spoofing is the other approach. Some network-attached storage (NAS) vendors use it to compensate for the slow performance of back-end RAID. Basically, you put a big memory cache in front of the storage array that’s directly attached to the NAS head (a thin server) and acknowledge application writes as received and recorded, but before data is actually recorded to the disk (hence the term “spoofing”). In the old days of mainframe channel extension, we used to say this strategy prevented the channel from “being held high” -- in other words, we fooled the app into believing its data had been received and written so it could go on about its business.
There’s nothing wrong with spoofing, except for the steep cost a NAS vendor charges for the memory modules used in its spoofing approach. Another implementation that I find to be less costly is to use DataCore Software’s SANsymphony-V storage hypervisor, which applies inexpensive server DRAM to do the same thing other vendors do with proprietary and pricey caching controllers or flash solid-state drives (SSDs). Basically the cache writes are placed into queues, unbeknownst to the application, where they wait their turn to be written to disk.
So, yes, Virginia, there are things you can do to improve storage performance. But that doesn’t necessarily translate into faster application performance. Like it or not, application performance may have very little to do with storage at all.
Sometimes an application is hosted improperly. Not long ago, I was brought in to troubleshoot a database that was taking more than 90 minutes to load into production. The database, originally hosted on a mainframe, contained more than 100 years of commodity exchange transactions and had grown from its original design into a multiheaded hydra held together by spit and baling wire. It seemed Oracle had charmed the CIO into migrating off a mainframe and onto a RAC hosting platform -- a deal sealed by a promise that doing so would score the CIO a cover photo in Oracle Magazine. Chicks would dig him and guys would want to be him, I suppose. The actual outcome of the strategy, a database with significantly reduced performance, changed that cover page into a poster for stupid.
The hosting platform does count for something in the determination of the root cause of slow performing apps. Another example comes from a conversation I had last year with the former boss of VMware’s European operations, who left the company for a startup because he wasn’t seeing the coin from sales anticipated by the hypervisor vendor. “People are reusing their retired servers after consolidating their apps with ESX and vSphere as hosts for more guest machines,” he complained. The result was that server hardware, with fewer cores and sockets and less memory than is suited for guest hosting, was being placed into service, producing, among other things, abysmally slow application performance and a lot of disgruntled users.
Server hypervisors like VMware’s can introduce a lot of performance impediments as well. The LSI Logic controller emulation used in VMware’s strange brew of microkernels is a big I/O chokepoint they’ve sought to address by requiring hosting platforms with more sockets and memory (a hardware-centric brute force effort to improve performance that requires a big Capex spend on new server hardware) and by introducing non-standard SCSI commands in a desperate effort to offload “up to 20%” of I/O workload to “intelligent” (aka pricier) arrays beneath. Now, they’re trying to write yet another microkernel to take over storage entirely. Good luck with that.
Finally, it’s worth mentioning that sometimes application performance sucks because of, well, the application itself. One outfit I know about is using an interesting scheme to protect its Exchange environment that leverages CA Technologies’ ARCserve Replication (formerly known as CA XOsoft Replication) to fail over a clustered email hosting setup that stores its mailboxes on a back-end Fibre Channel fabric to a 1U rack-mounted server running VMware at an ISP a couple of hundred miles away. The solution works pretty well to safeguard 40,000 mailboxes against a potential hurricane or other disaster. The operators say users don’t notice the difference when the platform fails over to the virtual host, mainly because Exchange isn’t the best performing application in any case!
This was my response to the folks at the IBM event, a couple hundred CIOs from Global 2000 companies. Their response was an ovation.
Seems like some folks get it.
BIO: Jon William Toigo is a 30-year IT veteran and is CEO and managing principal of Toigo Partners International and chairman of the Data Management Institute.