The Department of Homeland Security's Immigration and Customs Enforcement (DHS/ICE) division is controlling terabytes (TB) of forensic data over more than a dozen locations using software from Nirvana Storage, a division of government contractor General Atomics, and hardware from EMC Corp .
Nick, also the former section chief of the DHS/ICE Digital Forensics Section, and currently employed by law enforcement IT consulting company Delex Systems, Inc., said that the department received funding from Congress in 2004 and used it to take bids from storage vendors for setting up the data-forensics infrastructure. Dell Inc. and EMC Corp. were the lowest bidders with iSCSI Clariion CX-500i systems as well as servers and networking products. Nick said Nirvana's software was also packaged into the bid.
At the time the RFP was submitted, EMC had yet to embark on its big ILM push with its own software products; Nick admitted that the bigger companies' backing for Nirvana was part of what gave the department confidence in the relatively unknown software vendor. But, he also said, "I've asked EMC numerous times if they have a competitive product [to SRB], and they seemed comfortable saying they didn't have anything on the shelf to do what SRB did for us."
In fact, EMC still says it doesn't have any product that competes with SRB directly; EMC says it was Dell that officially submitted the bid to DHS, similar to previous joint customer deployments that included SAN file system vendor Ibrix Inc. According to Nirvana's director of business development Joel Zhou, EMC and Nirvana have been partnered since 2003, and have about half a dozen joint customers, most of them federal or state government agencies. Nirvana is a CAS Specialty member of EMC's Velocity Technology & ISV Partner Program; EMC officials said the two companies also have joint customers in the oil and gas industry but that no other customers are publicly disclosed. EMC does not resell the Nirvana product, and both Nirvana and EMC officials had no comment about whether or not there have been talks about an acquisition.
Each of the ICE's 26 US locations will eventually be outfitted with one of the systems if all goes to plan, said Nick, who retired from federal service in January. Currently, about half the locations have the Clariion system installed, and each site averages about 15 TB of data. All of the systems are managed by a single administrator in Fairfax,.
The Department has spent $13 million to date on the project, including networking and server upgrades as well as storage, but it could have cost much more without Nirvana's SRB software, Nick said. The department needs only one full-time administrator managing all the storage instead of 10 or 12 .
The software was originally developed by engineers at the San Diego Supercomputer Center (SDSC) for use with multiple heterogeneous storage systems in large deployments much like the one at DHS. Nirvana, which claims over 150 customers among universities, research institutions and government agencies, was established in 2000, but it has only been this year that the company has begun marketing itself commercially, according to Nirvana's director and chief architect Tino Scheder .
"Commercial companies are now facing problems that previously were more applicable to large federal agencies" as data growth has increased in enterprises over the last few years, he said .
How does SRB work?
Nick said that SRB's metadata management application, which is operated through Internet Explorer using a Windows-based Web application or a Java client, depending on the location, allows forensic investigators to "tag" data according to case number, date, suspect, priority level and other identifying information. The metadata is stored on the SRB's Metadata Catalog server, or MCAT, which consists of SRB's proprietary application layered over an Oracle database for search purposes.
Though the software is based on a file system, said Scheder, it can track data from any type of storage device, including NAS, DAS, SAN, relational databases, tape archives, CAS systems, document repositories and Web servers. For block-based SAN systems, the software, which uses host agents to collect data, registers files as they appear on the host. For databases, queries are repackaged in XML, Excel or HTML templates that make them look like a file.
"There's no real limit on the file system," Scheder said. "We have several customers [using it to manage] between 200 and 400 TB across multiple physical sites."
According to Nick, DHS has its system set up to automatically migrate data between tiers of SATA disk after 30 days, and sends the data to tape as well as a CAS system at a DR site for archival after 60 days, depending on the status of each file. SRB not only performs the migrations, but also keeps detailed records on the movement of data and "pointer" files that allow servers to access files which have been migrated between disk systems. It's worth noting that the agency works from copies of the original evidence, so the reformatting done by SRB doesn't compromise investigative data, Nick said.
The migration is performed using a daemon, according to Scheder, which searches the metadata repository for keywords including the date of creation, the date of last access, and which class of storage device the data resides on, all of which are pre-tagged by the administrator of the system. When it finds the data, the system builds an XML structure that the host agents use as a proxy for migration, based on a proprietary SRB protocol. At the end of the process, the ILM daemon collects scripts and updates the metadata pointers within the MCAT repository. The administrator then gets a report from the MCAT system which can be used for chargeback.
Last week, Nirvana announced an update to the fifth generation of SRB, now called SRB 2007, which includes Windows Active Directory integration and the integration of access control lists for each data repository. The new access control lists are overseen by a new role-designee in SRB called the "curator." Also new is a metadata daemon, similar to the ILM daemon, which allows for policy-based reporting and management of the metadata catalog.
Pricing for the package is based on three levels according to the number of physical locations in an enterprise, and quotes are given on a case-by-case basis, Scheder said. The software licensing starts at $20,000.