Problem solve Get help with specific problems with your technologies, process and projects.

Part 1: Visualizing data flow

Jim Booth describes steps to visualize the flow of your data through a networked storage environment

This tip is brought to you by Click to check out other storage tips.

Jim Booth

Jim Booth is our SearchStorage storage administration expert and our newest storage management expert. Jim is director of systems engineering for Creekpath Systems.

If you have a question for Jim, enter it here (for storage administration) or here (for storage management).

Also, if you are looking for more on administration or storage management, view more of Jim's expert answers. Jim has also been contributing his input on storage management issues in our .EMQZaac7k06^1@.ee83ce3>Storage Management Tips & Tricks discussion forum.

How do you watch a movie? Do you pause at every frame to scrutinize the details before moving to the next frame? Or do you fast-forward through an entire show hoping to grasp the meaning as the frames fly by at 72 frames per second?

A bit extreme, but this is how storage management could be viewed today: Using element managers to look long and hard at the details, while relying on framework managers to capture and filter huge quantities of information. Virtualization tries to create an abstract view, but hides too many important details to be intuitive and effective. It is my belief that visualization is the key to extracting real information from the storage environment.

In the first part of this tip, I will discuss data flow and visualization. In part two, I will discuss metrics and the evolving standards of CIM/WBEM.

To truly visualize the data flow, one must correlate two pieces of information.

  1. Element information -- In the form of Visio diagrams, Excel worksheets, or any other method you prefer.
  2. Data Flow Characterization

Data Flow

When meeting with professionals and learning their business as it relates to the usage of storage and storage management methods, I find that it is essential to understand how the data is flowing through the infrastructure. Taking a high-level look at how data enters the infrastructure is critical. This could be data created from users within the company or external data coming from access points.

You can characterize data flow as follows:

  1. Establish the data's point of origin.
  2. Associate the data with application(s).
  3. Determine the logical connections between the processing and data components.
  4. Follow the life-cycle of the data (including replication, retention, and deletion).

By visualizing the data path, one can control and manage the flow of information through the enterprise. Only through end-to-end understanding of the visual data path can one truly understand and manage the storage environment.

To visualize a working storage environment, it is more important to look at how the elements interact than the actual elements themselves. Look for tools to evolve that manage and correlate the elemental relationships.


"Just show me what I already have." This is a common comment I've been hearing within many IT environments lately. So I ask a few questions:

  • Can you see your HBA's?
  • Can you see your switch and ports?
  • Can you see your logical disks?

Surprisingly (or not), the answer is typically, "Yes" to these questions. Digging a little deeper reveals that what the CIO is really asking is not to see "what I have" but, "How do all my storage elements interact and work together?"

This is a completely different question and a much more complex situation. Viewing each element in the storage network as a static entity can usually be done with the tools that accompany each device. JNI has EZ Fibre, Emulex has LightPulse. With a Brocade switch there is WebTools and Fabric Manager. Both EMC and HDS have tools to provide virtual views of their configuration information. The problem lies in extracting real, meaningful information.

To do this one must take the element information and merge that with what is known about the data flow. This can be done as follows:

Step 1. For each element in the storage network determine where data enters and exits.

For example: SUN 220R running Oracle populates its tables from an outside network input -- the Oracle DB resides on external storage).

Step 2. Determine the type of data entering and exiting the storage network elements (thereby associating an application or function)

Expanding on our example: Map the IP network connections to the SUN , and map the HBAs (output).

Step 3. Map the application elements on the storage.

We now have a relationship between the data entering on IP address through the process element out to the HBA and have mapped the application elements, in our example Oracle, onto physical and logical disks -- this is where all those Excel worksheets come in handy.

A more complex example would require mapping through one or more switches. Most switch vendors provide software to determine these mappings.

Step 4. Follow the life-cycle of the data (e.g. replication, retention, and expiration)

In our example, the database files are backed up on a daily and weekly schedule. You would also want to note the protection scheme in use. The archive files are backed up on a weekly basis and moved off-site each week.

At this point we have the IP addresses from where the data originates, the HBAs the data flows through, the storage elements (both physical and logical), and how the data lives.

This correlation of the data flow and storage elements provides powerful information. For example, knowing that a failed HBA is going to degrade performance on your Oracle database application is powerful. A bad, mirrored disk in the array triggers an alarm. Great. But, knowing that this particular HBA or disk is not connected to a mission-critical application might change how this error is handled.

By visualizing the logical relationship between storage elements, one can extract real information without losing important details.

It's a lot of work to do by hand. Software solutions will handle most of this tedium in the future.

Dig Deeper on Data storage management

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.