Managing a cloud Service level Agreements (SLA) is an imperative for successful deployment of the cloud. There are no two opinions about it, but there are two modes of executing the SLA. Both these modes are equally important.
The “Business as Usual” Mode deals with service levels that can be obtained during periods of normalcy.
The Disaster Mode expounds on services that can be obtained in disaster situations.
What are the service levels obtainable during normal business operations? First, the service must be reliable. The cloud vendor must do everything that needs to be done to ensure that the service is available and operational during normal business hours. In order to ensure reliability, the service provider must define all metrics for downtime, latency and availability. If there is an assurance of 99.9% up time, the data servers must always be accessible and highly available. The vendor must mirror, replicate and store the customer data in multiple servers or storage repositories so that failure of one server does not deprive the customer of access to the data. Appropriate failover protocols must be put in place to feed the customer with information even when the primary server fails or goes offline. If scalability of service is an agreed parameter, the cloud vendor must have the infrastructure ready and available for on demand scale up. If Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPOs) have been agreed upon, the vendor must ensure that recovery happens within the time frames specified and up to the point specified. This may involve setting up and maintaining continuous backup services with appropriate idle time backup and bandwidth throttling protocols in place as promised.
However, the danger of working with an SLA constructed in this mode is that the end user remains unprepared for disaster. The SLA does not answer questions that will loom large before a customer who is suddenly faced with natural or man made disasters that may wipe out all systems within the enterprise. SLAs prepared in the Disaster recovery mode will provision for action points that will be executed by the vendor in disaster situations. The SLA will detail possible crisis situations that may arise in the data center and how the vendor expects to tackle them. The SLA will provide a formal procedure for handling, escalating or controlling the disaster situation and the metrics for recovering IT systems for business continuity.
It is fairly evident that SLA constructed exclusively in one or other mode will be disastrous to the organization. The SLA must provision for both the situations and detail out the action points and metrics for either situation.