Mopic - Fotolia

Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Cloud file-sharing services: Get accurate benchmark results

If you use cloud file-sharing services, you've most likely conducted benchmarks to see how they perform. This tip ensures you're interpreting the results correctly.

For cloud file-sharing services customers that sync a large number of files on a regular basis, potential file-processing overhead is important to quantify. I find that a 50 MB or 100 MB file is a good size for testing, as it is small enough to run fairly quickly, but large enough that the runtime won't introduce measurement errors. Test files should contain data that is not easily compressible -- such as video files -- to ensure the transfer speed measured is not the result of a file being squeezed to almost nothing before traversing the network.

I also recommend using a directory full of many small files -- I use a directory that contains approximately 5,000 files of 10 KB each. The data transfer time required for the test is very small and it provides insights into any per-file overhead of cloud-sharing services.

But determining what files to test is just the beginning. The following areas will help you to accurately obtain and understand the results of your cloud file-sharing services benchmarking project.

Precise measurement is key

Ultimately, bytes transferred for large files and files processed for directories of small files should be measured, depending on which part of a test suite is running. To do this, you need to know what was transferred and exactly how long it took. As simple as it sounds, this will likely be the most challenging part of the cloud file-sharing services test process. In addition, the automatic synchronization function should be turned off when running the test.

To test, load the file or folders to be benchmarked into the appropriate test folder on the client to test upload or in the cloud interface to test download.

Some services provide an easy-to-read log that indicates exactly when a synchronization event began and ended. Others might provide an inexact end time -- hours and minutes, but no seconds -- as a status message when the synchronization ends. Unfortunately, you need to know exactly how many minutes and seconds a sync event required to calculate throughput. There are several methods you can use to capture the synchronization runtime, but one of the simplest and most effective ways is via a stopwatch function. Click Start when you turn sync on, and Stop when the "sync complete" message is displayed.

If you have to time the sync manually, this is another good reason to find the "right size" for your test file. If the file is too small and the runs finish in three seconds, for example, the user error of clicking Start and Stop on your timer could dramatically skew your results. If the runtime is 100 seconds, for example, a second delay in stopping will not greatly influence your results. Similarly, if you choose a file size that is too big and each run takes 90 minutes, you will be staring at the screen until the "sync complete" message arrives.

The stopwatch method is less than ideal, but it might be your best option. I have tried several other measurement methods with mixed success. At one point, I used the open source WireShark network capture tool to capture a trace of the entire sync event for later analysis. This provides detailed time stamps for each packet. However, each service uses proprietary protocols for syncing data and it is pure guesswork to locate where the sync event starts and ends with any accuracy.

I've also used the AutoIT scripting tool to display the time and take a screenshot for later review. This can be looped at any interval to get the level of precision needed. But you will likely end up with potentially hundreds of screenshots to sift through to isolate when the "sync complete" message is displayed.

Reporting results: Metric conversion is beneficial

After gathering accurate data, you need to report it in a meaningful manner. I've seen some testers report the raw results -- elapsed time -- as the final metric for cloud services. But I believe the results need to be converted into a more meaningful, universal metric. For large file transfers, that basic metric will be Mbps.

You can also take the throughput results and calculate a percentage that compares the throughput to the available link speed. Thus, when a 5 Mbps download speed is shown on a 10 Mbps link as 50% available, you can use tests running from different locations to compare not only raw throughput, but throughput relative to available bandwidth. It is an easy calculation that can provide some good insights.

What users care about is how many files were processed per second, which can be determined by dividing the time in seconds by the number of files processed.

For small file tests, Mbps don't matter as much. What users care about is how many files were processed per second, which can be determined by dividing the time in seconds by the number of files processed. While you might not think this time would vary much between cloud file-sharing services, it can. I have seen tests where one provider could process approximately 15 files per second and another only one file every two seconds.

Cloud file-sharing services gotchas

Here are a few more areas to consider when benchmarking a cloud file-sharing services project:

  • Test location. Throughput will vary by location, and it can be dramatic. If your users will be based primarily in Chicago, don't run your benchmarks from a site in San Francisco. Getting the same bandwidth configuration in both locations doesn't mean the results will be the same, as there is no easy way to know the latency or bandwidth between each office and the service provider's facility. So test where you will use the service.
  • Outliers. Your test results may vary significantly based on the time of day, day of the week or for unknown reasons. Data is traveling across a cloud that you do not control, which could mean run differences of several Mbps. To work around this, you need to tell whoever is reviewing the results that the performance is variable. You might want to run five tests, exclude the best and worst results and then average the remaining test results. The fluid nature of the cloud is such that you might test for a very long time before you get results that statistically have a lower standard deviation. If you simply average all your results, a few very good or very poor runs could skew the results such that what you report may not be representative of what can be expected.
  • Shadow copies. There are a few unique gotchas with this test. You will likely use the same file as you repeat your tests, which is fine. But if you discover the file suddenly uploads dramatically faster than it did previously, it might be due to a shadow copy. If the file is already on the server -- even hidden in the trash -- some systems are smart enough to know this and not upload it again. The same is true for files you have already downloaded. You should delete and empty the trash to ensure you are truly downloading files in your sync.

 Be methodical, use common sense to do sanity checks on your results, translate your results into meaningful metrics and you'll be good to go.

Next Steps

Understand the capabilities of your cloud file-sharing provider

Cloud file-sharing concerns include security and access

Google launches Perfkit cloud benchmarking tool

Learn what cloud file services factors you should include in your benchmarking test.

Dig Deeper on Unstructured data storage