Estimating the hardware requirements to handle a large storage job is notoriously difficult. There are so many factors involved, and so many parameters, it is hard to come up with a precise answer to the problem.
One way to approach estimating storage requirements is try to establish upper and lower bounds on the size of the system and to refine those boundary estimates as the design process continues. The initial estimates may be little better than a collection of rough estimates scrawled on the back of an envelope (metaphorically, at least). As more information becomes available those estimates can be refined and the boundaries converged to yield a fairly solid idea of what you are going to need.
In conducting this kind of analysis it is important to realize that there is more to any storage system than capacity. Capacity is important, but the system throughput -- the ability to get the information to the users as they need it -- is equally important. Often throughput is more limiting than storage capacity and it is almost always harder to fix after the fact.
Although you can start the process anywhere, the most common starting point is user needs. How many users will the system have and what will they need. In other words, what are the capacity and throughput requirements for the users and how many users will you have on the system simultaneously?
Once you know the user requirements, you can use that to estimate the I/0 bandwidth you will need to support them. That, in turn can be used to estimate the number of disks or arrays required and that can be used to determine the best way of subdividing the load. From there, you can get an idea of the capacity requirements for the storage server, and so on.
It's important to realize that a lot of this is a process of creative guessing, especially in the early stages. You probably won't have all the numbers to work with and in many cases (such as actual disk data transfer rates in your application) you'll end up with a best-case, worst-case, most-likely range rather than a single number. However you will also find that not all numbers are created equal and some of these factors can vary widely without affecting the results.
Brocade Communications Systems has a white paper demonstrating estimating bounds in designing a storage system. The paper focuses on a server for video-on-demand, but it illustrates the technique is broadly applicable. It is at: www.brocade.com.
Rick Cook has been writing about mass storage since the days when the term meant an 80K floppy disk. The computers he learned on used ferrite cores and magnetic drums. For the last twenty years he has been a freelance writer specializing in storage and other computer issues.