Following EMC Corp.'s July acquisition of Greenplum Inc., EMC today launched a data warehouse appliance to compete...
with Oracle Corp.'s Exadata, as well as systems from Netezza Corp., Teradata Corp. and others. The EMC Greenplum Data Computing Appliance (EMC Greenplum DCA) is a standalone direct-attached storage device (DAS device) that can also be packaged with EMC Clariion SAN and Data Domain systems for advanced data protection.
The EMC Greenplum DCA consists of Greenplum 4.0 software released last month and a hardware box that includes 16 servers and 192 Intel cores, and holds 36 TB uncompressed or 144 TB of compressed data in one rack. The system comes in half-rack and full rack configurations, and scales to 24 racks. EMC claims the Greenplum Data Computing Appliance can scan uncompressed data at 24 GBps and load more than 10 TB per hour per rack.
EMC also sells Greenplum 4.0 as standalone software, but EMC director of product strategy Ben Werther said the DCA appliance is tuned for top performance. EMC also offers the DCA packaged with Clariion storage and EMC RecoverPoint replication software for disaster recovery, and with Data Domain deduplication devices for backup.
"With the SAN, you get advanced features like snapshots and disaster recovery," Werther said. "You can offload mirror copies and increase the capacity of the appliance."
EMC moved beyond its core storage market when it picked up Greenplum to compete with vendors such as Oracle and IBM in selling large data warehousing platforms. IBM followed last month by agreeing to acquire Netezza for $1.7 billion, and that deal is expected to close by the end of the year.
The EMC Greenplum Data Computing Appliance is part of EMC's new data computing products division led by former Greenplum CEO Bill Cook. Merv Adrian, principal at IT Market Strategy, a consulting firm that specializes in data management, said the Greenplum product line lets EMC address organizations' valuable information outside of its databases.
"In many shops now, there's more data in file systems outside corporate databases than there is inside the databases," he said. "That's valuable information, and that data is growing faster than the data inside databases. There may never be a good reason to put that information into a database because it doesn't need to be managed the way database data does. A product like this lets you flexibly provision a sandbox to work with the data. Then you can tear it down and throw it away when you're finished. You don't do that with a multimillion dollar analytic database."
Still, the EMC Greenplum DCA isn't a low-cost appliance – pricing starts at $1 million.
When Oracle first launched Exadata – based on Hewlett-Packard hardware – in 2008, it was seen as a threat to storage vendors because SANs have been the preferred platform to store Oracle databases. After acquiring Sun early this year, Oracle updated Exadata using Sun hardware.
Adrian said Exadata is ahead of the EMC Greenplum appliance from a hardware standpoint due to its use of Flash and InfiniBand, but he believes the EMC appliance has architectural advantages.
"The way Exadata operates on storage is different," Adrian said. "EMC isn't yet doing the kind of hardware assist at the storage level yet that Exadata is doing today. Oracle upped the ante by leveraging significant technological advantages in memory usage, smart storage and the InfiniBand interconnect, and figuring out how to use all those pieces together. EMC isn't quite there yet, but EMC has other advantages -- it's much more MPP [massively parallel processing]-oriented."
Greenplum uses a MPP shared nothing architecture that spreads the workload among servers so there's no performance hit if a server fails. Exadata uses a shared disk model. EMC's Werther said the shared disk model means performance suffers when a server goes down. Netezza and Teradata also use the shared nothing architecture.
"EMC isn't just competing with Oracle, it's competing with everybody in this market," Adrian said.
Online marketing firm Dotomi Inc. is already an EMC storage customer and decided to purchase an EMC Greenplum Data Computing Appliance after benchmarking an early release system alongside two other "usual suspects," according to Ken Treske, Dotomi's chief marketing and operating officer.
"Greenplum crushed a lot of specialized queries we were working on," he said.
Treske said Dotomi processes a large amount of transactional data coming from its website to make marketing decisions for its customers. "We have to be able to take dirty data, process it and make real-time marking decisions," he said.
He said Dotomi is also purchasing a Data Domain box with the EMC Greenplum DCA. "Having the backup piece from the same vendor was a strong selling point," he said. "There can be a lot of maintenance backing up these large amounts of data."
Dig Deeper on Data management tools