What you will learn in this tip: SearchStorage senior writer Carol Sliwa asked David Floyer, co-founder and chief technology officer at Marlborough, Mass.-based research and analysis firm Wikibon, to explain the critical design principles to keep in mind when plotting the database infrastructure, including storage of in-memory databases.
The reason you want to put the database in memory is quite simply to have levels of latency and variability as low as possible, but to have the principle of all databases the same.
If you have a throughput issue to the database, the cheapest and easiest way to increase the throughput is to lower the latency, or average response time, and to lower the variance of the individual response times. The best way to lower the latency and variance is to design the I/O subsystem and the key parts of the database, such as the log file and high-activity components, with capacitance-protected dynamic RAM (for small amounts of data) or NAND flash. You can do other things, but usually flash is the fastest. If you're a storage administrator and you do not want to be accused of being the bottleneck on the database, put in the fastest storage you can and monitor the latency and variance. If you do that, you will almost always be able to say that it's not an I/O problem -- it's a database problem or an application problem.
At the moment, many databases put everything into capacitor-protected dynamic RAM (DRAM). Another approach is to use a mixture of DRAM and flash. For example, the Aerospike in-memory database has a flash architecture whereby it writes everything to flash first and then reads it into DRAM if required. There are multiple copies of the data on different servers and a remote copy in a different place for data protection. But the flash itself is the master copy of data.
Often, when purchasing an in-memory database such as SAP HANA, you will buy an appliance. That appliance has DRAM with capacitors to hold a charge in the event of a power failure. The appliance also has hard disk drives, solid-state drives or PCI Express cards for functions such as page swaps and system recovery.
From the storage point of view, DRAM is a read-only buffer for in-memory databases. That has nothing to do with the storage. You want to have large amounts of persistent storage, and you want an array or a set of flash cards that will provide low latency and low variance. The closer you have the persistent storage to the database, the better the results will usually be because you'll have less network overhead. The guaranteed point-to-point architecture of Fibre Channel (or even InfiniBand) is always better for high-performance databases than iSCSI or NFS. Another way of getting higher bandwidth (and indirectly reducing latency) on the writes is to use wide striping; of course, you can use wide striping for reloading the database.
You don't need to have the flash in the processor. It's quicker and easier to do it that way, but you just want it as close as possible. You might have a clustered database, and you have to use an array because there are 10 servers connected. If the flash is on the array, you want to ensure that it is dedicated to that particular application, and you want to make sure it has as few hops as possible on the storage network.