[caption id="attachment_3630" align="aligncenter" width="550"] Whether programming with Microsoft's platform (seen here) for Amazon's, various issues lurk, including the possibility of vendor lock-in.[/caption] How often do programmers get to choose the technology underlying their work? Oftentimes, an executive or committee higher-up does the selecting, and we’re forced to live with it. One such example is cloud hosting. In a large company, an executive committee might choose from any number of offerings, including Amazon Web Services or Microsoft Azure. With a small startup, the ultimate decision might involve fewer people, but the programmer still isn’t the only voice in the room. In the end, though, it’s the programmers who ultimately have to deal with the consequences of those technology choices. What will the next year be like if we end up developing software destined for Amazon Web Services, as opposed to Microsoft Azure? From a pure programming standpoint, will the choice of platform even make a difference? I’ve always stayed as far from Azure as possible, because I didn’t want the cloud choice to have an impact on my programming. I write the software, and I deploy it to the host. Which host? It shouldn’t matter. I’d heard there were things I’d need to do in my programming before it could run on Azure. But was I even correct? I finally decided to find out for sure. My goal was to determine two things. First, how much would vendor choice impact my programming? Second, how difficult would my life become if my employer decided to stop using that particular cloud vendor in midstream and switch to a different one altogether? One thing I’m intentionally avoiding in this comparison is the question of pricing. For this study, price isn’t an issue. I’m also going to use whatever features are available in the respective Amazon and Microsoft plans. In the cloud world, there are two primary levels where features operate: Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS). Generally speaking, IaaS refers to a lower level wherein you can manually allocate servers and install the operating systems, but it’s up to you to install different software such as the Java runtime. The hosting provider takes care of really low-level stuff such as the physical allocation of the hardware, as well as the installation of the operating system; but when it’s time to install MySQL or Oracle or the Java Runtime—well, that’s your job. PaaS offers the same services as IaaS, plus features such as tools for application design and deployment. The general rule of thumb has been that AWS operates more at an IaaS level, whereas Azure operates at the PaaS level. And for the most part that’s true—but the lines are blurred. For example, with AWS you can start with an image that already has a database and Java Virtual Machine installed. Furthermore, Amazon has a couple of data stores that you can use but don’t have to manage. Fine, whatever. But with AWS, there’s one notable exception that I’ll take advantage of in this article: a service called Elastic Beanstalk, which is a way of quickly and easily deploying your application to AWS by simply uploading it. AWS then takes care of the rest, including load balancing and auto-scaling. That’s definitely beyond IaaS and well into the world of PaaS. Before we get started, I want to point out one more expectation in my tests: I’m assuming that we—the programmers—know how to develop an application that scales to the cloud, including Web-based applications that are session-less. Because we’re developing an application that scales, I’m going to try to take advantage of all that the cloud offers, even from within my program. In the case of both Amazon and Microsoft, that’s going to mean taking advantage of any and all APIs on offer. Will that force vendor lock-in? Let’s find out.

Java and Amazon EC2

For the first test, I’m building an application that will run on Amazon EC2. I’m using Eclipse with Amazon’s own ASW Tools for Eclipse. This is a plugin for Eclipse that lets you create an EC2 instance right from within Eclipse, which I did. However, in deciding the impact on my programming work, I’m not going to factor that in, since allocating instances is in the realm of IT as opposed to coding. In a large shop, it’s usually the production team that deploys the apps, not the programmers. Instead, the programmers would be working on a local installation. The AWS Toolkit for Eclipse includes a sample application called Travel Log. Buried inside this application are functions like this:
public static InputStream loadOriginalPhoto (Photo photo) throws IOException {

S3StorageManager mgr = new S3StorageManager();

TravelLogStorageObject obj = new TravelLogStorageObject();

obj.setBucketName(uniqueBucketName);

obj.setStoragePath(photo.getId()+FULLSIZE_SUFFIX);

return mgr.loadInputStream(obj);

}
This is a function for loading an image from Amazon’s Simple Storage Service, or S3 for short. Notice the S3StorageManager object. That’s actually part of the sample code, not part of the AWS API. But the code for that class, in turn, does call AWS-specific code that lives in various packages under com.amazonaws.services.s3. In other words, the example has AWS-specific code in it. That’s not a problem, of course, but it does tell me that the example is locking us into AWS. In turn, that forces us to decide:
  • Do we use AWS-specific code and take advantage of various AWS features but get locked into AWS?
  • Or do we avoid lock-in, but also not be able to take advantage of AWS features?
This might seem like a trivial issue right now, but later on down the road it could come back to bite us. Suppose your company’s executives (who have no technical knowledge) suddenly announce that they’re tired of the high costs of AWS, and want to move to another cloud host. They didn’t ask your programming team or your manager if that was a good idea from a programming level. After all, from his perspective, it’s just a program and you can just upload it to another server, right? That’s when your immediate superior makes a mad dash into the executive suite, to explain to anyone who’ll listen about the huge problems involved in switching to a different hosting provider. Tempers flare, different people get blamed, somebody loses their job, and then you get stuck spending the next month working overtime ripping out the AWS code. Yet, the AWS code really did help.

A Brief Pause for Some Debriefing

As you can see, we’re barely into these tests and already there’s an issue. But let’s put it into perspective for a moment. Several companies out there provide vendor-agnostic management tools for managing your clouds. One of their claims to fame is that you can use their management console to launch instances on different platforms (Amazon, Rackspace, and others) without having to make adjustments for the particular platform. Some of these companies even have their own images, which are actually made up of a list of vendor-specific images. You launch one image and pick whichever vendor you want, and their system works behind the scenes to pick the real image specific to the vendor. That’s a big sell to the business managers, because it gives them a warm squishy feeling that they’re avoiding vendor lock-in. They even read articles (some written by yours truly) that explain how the IT staff can use a standardized API shared across vendors to create their scripts. However, in that situation we’re not talking software and Web applications: we’re talking management scripts for deploying instances. And that’s a huge difference. But there’s hope: new cloud standards also include developer-oriented APIs. For example, OpenStack (which was created in part by Rackspace) includes APIs for things like uploading objects (such as images) into containers. Even so, that’s of little help to us for our current project on Amazon. Back to our scenario: the VP has been sold on this whole notion that everything is portable, while us programmers were left out of the decision. We use the tools available to us (including the AWS SDK), and suddenly find ourselves in a bind.

Quick, Back to Our Code Before It Gets Rewritten

Clearly Amazon’s API forces us into a vendor-lock-in situation. But if we recognize that fact going in, and if we’re okay with it (which you might not be), then the APIs are available. Let’s review some of the code in order to get a better sense of its complexity. I’m warning you here that my goal isn’t to teach you how to use the API; instead, I want to show you what it entails, so you can decide it does make your life easier or not. In essence, it comes down to this: do you like using third-party classes that fully encapsulate a RESTful API? Amazon has built their API using a REST interface (although some people have criticized it for abusing the REST verbs, arguing that it doesn’t technically qualify as “RESTful.”) Instead of calling into the API yourself, Amazon has built a rather large and cumbersome set of classes around these calls. Using the classes isn’t very difficult, but their presence does add some overhead. What’s interesting, though, is that the sample included in the toolkit, called Travel Log, features a wrapper of classes that sits atop the existing Java API. I mentioned these wrapper calls earlier in the article. But if you want to use the Java API itself, I encourage you to look at the samples that come with the SDK, separate from the AWS Toolkit for Eclipse. These demonstrate the basic set of Java classes provided for us by Amazon. For example, there’s a sample for uploading a file to an S3 account that only uses the API classes. When you cut out all the comments and exception handler, and get to just the code that actually does the work, you see this:
AmazonS3 s3 = new AmazonS3Client(new PropertiesCredentials(

S3Sample.class.getResourceAsStream("AwsCredentials.properties")));

String bucketName = "bucketname"; String key = "MyObjectKey"; s3.createBucket(bucketName); s3.putObject(new PutObjectRequest(bucketName, key, createSampleFile()));
The first statement reads your credentials from a properties file, and creates an AWS client object. The final two lines do the work of creating a bucket to put the object in, and then uploading the object itself. That’s really not so bad, which makes me wonder why the sample app has such a huge amount of classes written on top of these, when these aren’t awful to call. Ultimately, all these calls work in sync to piece together a REST call to the Amazon servers. You can see info on the REST call for the createBucket function here. Inside those docs, you can see a simple PUT verb taking place, along with credentials. Posted along with the call is an XML file containing information about the bucket. That XML is rather important here, because inside it is an element called CreateBucketConfiguration. That brings me to the notion of standard APIs. Some people consider Amazon to have essentially built the standard for clouds. The question, then, is whether any other vendor even supports an API call for creating a container of sorts by including XML that has a tag called CreateBucketConfiguration. If so, then it’s possible your code will port to those other cloud vendors. If not, then you—and your code—are stuck with Amazon, regardless of what the executives want. In that case, hope you’re already boiling up some coffee for that code-rewriting marathon. As it happens, all might not be lost. There are organizations trying to embrace standards—including some standards based on Amazon’s model. Look at this page for Google’s clodu storage. If you search through the page, you’ll see that they support the same XML and call. In fact, Google is working to implement the same API. Now I’m going to be completely honest here: I haven’t tried porting an Amazon Web Services app to Google’s cloud. Frankly, I have no idea if it would port, but I’m skeptical (and would love to hear if any of you have had any such luck.) Rackspace has its own API, for example, which it’s tried to unleash as a standard called OpenStack. And Eucalyptus claims its API features good compatibility with Amazon’s API (there’s even one little item called CreateBucketConfiguration that appears in some of its sample code). While a handful of companies are trying to standardize on Amazon’s API, there’s a big question of whether they’ll succeed—there are many cloud vendors out there, any many competing standards. In other words, the chances of switching to a fully compatible cloud don’t seem that great. While you might not be completely locked into Amazon, you are locked into a small subset of cloud vendors.

Score So Far

We haven’t even gotten to Microsoft’s Azure yet, but Amazon’s API hasn’t proven too difficult to use. On top of that, there’s a chance you might not end up fully locked into Amazon, thanks to Google.

Taming the Microsoft Beast

Now that we’ve dug down into Amazon’s API and determined some of the implications, let’s head over to Microsoft. When you sign up with Azure, you get an interface that has similar functionality as Amazon’s platform. You can allocate new instances of servers, and you can even choose from some Linux varieties to run on these servers. If you look at the Azure API docs for different languages, you’ll see many mentions of messaging (say that five times fast), whereby your different server instances can communicate with each other. The general idea is that you might share computing resources among different servers so that you can accomplish, for example, massive parallel computations that would otherwise be difficult (if not impossible) to do on a network in your own organization. The messages are implemented in what Microsoft calls a Queue service. Here’s a quick tutorial if you’re interesting in trying it out. Also, so we don’t go comparing apples to oranges, I’ll point out that Amazon also has a Queue service (you can read about it here). They also have a similar service called Simple Notification Service, which you can read about here. Note that Amazon’s messaging services all use the same RESTful interface, as I already described. But Microsoft does offer a storage API, one that—like Amazon’s—is also RESTful at heart. The company calls it Blob storage. Here’s the page that describes the REST API for adding an item to storage on Azure. A sample call looks like this:
http://myaccount.blob.core.windows.net/mycontainer/myblob
This is, of course, different from Amazon’s API. From a programming perspective, however, you probably won’t be making the REST calls directly. Like Amazon, Microsoft offers a whole set of tools for writing your cloud code. These tools cover several languages, including Java. The concept is similar to Amazon’s in that you can create containers and put files (called blobs) into the containers. When you include the provided Java libraries, you can easily make calls just as you can with Amazon. Here, for example, are two lines of code for creating a container, which I copied directly from the docs found here.
CloudBlobContainer container = blobClient.getContainerReference("mycontainer"); container.createIfNotExist();
This is a bit of odd approach: first you request the container, then call a special createIfNotExist function if it doesn’t exist. That means the first line returns a valid object even if the container doesn’t exist. The docs include a couple lines for uploading a blob, which is simply a matter of calling a function to request a reference to a new blob, before calling an upload function.

Scoring Microsoft

In general, the API for using Azure from Java isn’t too complicated. Like Amazon, Microsoft offers a set of tools that help write Java code for Azure; you can find out about them here and here. The APIs for working with Azure aren’t difficult to use. But what about standards? Can you port them? Underneath those Java calls are REST calls to the Azure servers. Those REST calls are completely different from Amazon’s, and, for that matter, anyone else’s. Just like Amazon, Microsoft has created its own API. The difference with Amazon is that others have attempted to implement the same API, treating Amazon as a standard. So right now it’s sounding like the score is tied: from a strict programming perspective, both companies have their own RESTful API, and their own libraries for using the API. The moment you start using either, you’re locked in for the most part.

Too Much Complexity

But those APIs are for extras like cloud storage and messaging. Some of us might not need them. For example, you might develop a Java web application that runs on Tomcat, and it works just fine without all the “bonus” features such as Simple Storage or message queues. And that’s totally fine; if you go that route, in some ways your programming life will be a lot easier, because you can run your app on virtually any server. Both Amazon and Microsoft give you ways to easily upload a Tomcat-based Java app and launch it and run it. If you’re not using any vendor-specific APIs, then it’s safe to say the experience you get on either Amazon or Microsoft will be roughly the same. But that means you’re also not developing an app that necessarily takes advantage of all possible cloud capabilities—not just add-ons, but scalability. Your app might need to expand and grow as your user base grows. You could just launch a second server, and set up the load balancing features. But for many apps, you might sooner or later find you’ll need a place to store (and easily access) heavily used data items. You might need your second server (and third and fourth…) to communicate with the others. And then you’ll find you really do need a vendor-specific API. That’s where we find ourselves with this article. Many of today’s modern web applications really do need all these services offered by such cloud vendors as Amazon and Microsoft. The moment you do need them, there’s little way around it: you’re forced into using their own proprietary API. Amazon’s API is a bit more standard, and so if you think you might move to another cloud, you might have a few options. But not many. At this point you might be asking: What the hell do we do? Are we going to end up locked in with pretty much any vendor? The answer is a “sort of” yes. At best, you’ll find yourself only semi-locked-in if you select a vendor with an API based either on Amazon or one of the new standards, such as OpenCloud. But there are a couple of other possibilities. Some open-source wrapper APIs under construction claim to support a whole set of clouds. In fact, here’s one on Sourceforge; I haven’t personally tried it and can’t vouch for it in any way, except to say that it’s just one example of a handful out there (do a Google search). I suspect we’re going to see more of these open-source APIs that target multiple vendors. With one of those, you can potentially use any number of clouds. As for the APIs themselves—if you’re not worried about vendor lock-in, then both Amazon’s and Microsoft’s platforms are actually pretty easy to use and are in many ways quite similar. I personally rank them both about equal in programmability.   Image: Microsoft