This recent outage really highlights the importance of testing. How often should power systems be tested? Who should be involved in those tests?
Testing should be done at least monthly, and
Power outages always accompany weather events. We have gotten used to them here in Florida. Fact is though that a hurricane is a phased disaster scenario. That is to say, there is a lot of advance warning that the storm may come your way. So, you can take some proactive measures and power everything down in plenty of time to comply with an evacuation order.
A few years ago, we did have problems during an
abnormally cold winter. Power was traded from our
generating facilities to northern grids that needed it
for heating. The result was rolling outages during
Christmas Day. But, companies had gone to some effort
as part of hurricane planning to work with their
utility service providers and ensure that redundant
switching equipment was used to supply them. When one
set of gear was taken out of service as part of the
rolling outage, they were still up and running on a
different set of gear. It just takes preplanning. After events like the rolling blackouts in California, do you think most of the companies in the affected regions were prepared for this?
Nope. Like anything else, people tend to put the past behind them. Protection costs money and CFOs these days often have a hard time justifying expenditures for capabilities that, in the best of circumstances, would never need to be used. Moreover, distributed computing has increased the number of vulnerable assets to power problems: Things aren't collected inside the glass house of the data center anymore. So, most companies are a target-rich environment for power-related events. Protecting a lot of distributed things is more expensive than protecting a lot of centralized things. Money is in short supply in most companies today. What should admins do as a post-mortem to this event?
Look hard at this event from the standpoint of distances between your primary and backup facility or hot site. The Fed wimped out on specifying a minimum acceptable distance between facilities after 9/11, but that doesn't mean that companies can't take the initiative themselves. Your backup site should not be served by the same power supplier or delivery infrastructure as your primary. It's as simple as that.
Secondly, look seriously at your internal power generation and power protection capabilities. Spending a few bucks now can keep equipment in service longer and possibly cut down on transient-related software and hardware errors.