Businesses choose to migrate their data center systems to the cloud for improvements in reliability and scalability.
Recently, a few high-profile data center outages have reminded us that moving your data center to the cloud is not necessarily a guarantee of 100% uptime:
Rackspace email and hosted exchange services experienced issues for nearly two hours in their Virginia data center. After services were restored, users experienced a residual delay in their email as “the database synchronized changes made during the time of impact.
Less than a year ago, trading was halted for hours on the New York Stock Exchange due to a “configuration issue,” described by an NYSE spokesperson as:
“The New York Stock Exchange and NYSE MKT experienced a technical issue and, consistent with our regulatory obligations, the decision was made to suspend trading as we worked to identify the cause and resolve it. The root cause was determined to be a configuration issue.”
We like to think of the cloud as a service that provides 100% uptime. While many cloud services have the potential to improve your reliability, downtime of cloud services is a reality. Technology leaders deciding whether to move critical systems to the cloud are faced with a difficult choice when thinking about the impact of outages of cloud vs. onsite services. As part of the decision making process in moving to cloud, CIOs should consider the impact and cost of cloud downtime on their business.
Here are 2 important considerations when evaluating the cost of cloud vs. onsite data center downtime:
Determine the Mean Time to Recovery (MTTR) of system recovery
Downtime is typically calculated in terms of cost per minute. Thus, the Mean Time to Recovery (MTTR) of a critical system is an important metric for evaluating the cost of downtime. For onsite data centers, review any historical outages to critical systems that you have had to develop MTTR metrics around total downtime (in minutes), overtime paid to staff (if downtime was after hours), and costs paid to outside service providers/contractors to assist in the downtime.
For cloud services, ask your cloud provider for a report on their most recent downtimes, the causes, and the associated outage times. Determine the additional staff and/or outside services you would need in the event of a cloud outage to support the issue (assuming it was off hours), remembering that once systems are restored in the cloud that there may be a synchronization time required to fully harmonize your systems (i.e. – Rackspace outage described earlier). End user support on something like email systems will be required beyond when your cloud provider restores services.
Review ongoing maintenance/operational costs of Critical System
In calculating the cost of operating a critical system to the cloud, first calculate the cost of moving that system. In many cases, outside consulting/services, staff overtime, etc are required to support the initial move and testing of a cloud-based system. This cost should be factored into the total cost of moving your application to the cloud and will result in a 12-month cost analysis that looks something like this:
To compare this to keeping an application in-house, calculate the ongoing hardware, software, and maintenance costs required to run the critical system internally. Add to this any outside consulting you leverage on an annual basis for system audit, review, and upgrades. In addition, include the overhead cost associated with support staff if this application was kept onsite. (Note – if the staff would be utilized even if the system were in the cloud, this cost can be removed from your calculation).
By evaluating the cost of keeping an application onsite over the period of your cloud contract and weighing that against the cost of downtime, CIOs can make a more informed decision on the true impact of moving a critical system to the cloud. Given the high-profile data center outages that we have seen, it is fair to assume that a cloud outage could happen to you. CIOs that have completed the true cost analysis of critical system downtime in the cloud vs. onsite will be better prepared to handle this crisis when it arrives.
For more information on calculating data center total cost of ownership, check out our video podcast.