I was following an interesting debate between Amazon and Terremark from the recent CloudExpo conference. Amazon’s position is, if a server fails, provision another . Terremark’s position is, no enterprise user can use a platform designed to fail and servers must be resilient . It’s an interesting debate - both Amazon and Terremark platforms were designed with different criteria.
Amazon are followers of truly distributed computing where for example, local disks on compute servers provide VM storage, among other things. No doubt this comes from Vogel’s background, having basically written the book on distributed computing. Amazon also uses the Xen hypervisor that has roots in distributed systems.
Terremark, on the other hand, follows an approach with compute being distributed but storage being centralized. A SAN, in the case of Terremark - or if you’re a distributed computing follower, a single point of failure. The SAN-based approach comes from vmware - the hypervisor architecture that was selected, in turn making operations like migration and evacuation much easier.
Of course both organizations are taking a position that is based on the architecture they have deployed and change the enterprise discussion to fit their design.
So what are our thoughts? Easy … why choose? Shouldn’t you be able to define levels of service when you provision (policies) – kind of like having your cake and eating it too. That was our design criteria for InstantOn. In reality if your enterprise application supports horizontal scaling you can generally follow the Amazon approach. This provides a less expensive platform and in turn reduces virtual machine costs. If your application assumes more “old school” redundancy, then you should look for “HA” solutions with resilient components. Today, the bias in enterprise software is certainly in the resilient machine camp but is rapidly changing as ISV’s adopt new blueprints to support cloud.
We approached this problem by defining each resource in our cloud with a number of descriptors (tags) used at provision time. These descriptors are defined in YAML making it very easy to manage and process. So as a virtual machine is provisioned, it is able to request a specific kind of resource, and should the resource not be available, it moves to the second choice. A simple example could be a particular workload requesting a specialized capability such as local SSD storage to provide extremely high IOPS, or hardware acceleration for mpeg encoding. It’s now as simple as locating the resource and dispersing the VM to that resource. A customer described this to me as playing Tetris – it’s like a bunch of falling blocks (vm’s) and a set of resources to best consolidate the space.
In our current release of InstantOn, the alignment of resources to workloads is delivered by Carpathia as part of our managed service. In the near future, we will open this up to customers. Just imagine being able to select your VM “size”, OS type and additional features.
The common message we’re hearing from our enterprise customers is this: the cloud platform should support – not define – these requirements.