Are companies and Big Data professionals who embrace Hadoop putting themselves at risk by being at the mercy of Google?
In March 2013, after about four years, the U.S. Patent and Trade Office granted Google 10 foundational patents related to MapReduce, a programming model for processing large datasets with a parallel distributed algorithm on a cluster of computers. MapReduce is also the basis for the Hadoop framework. Rather than transfer, assign, or license these foundational patents to the Apache Software Foundation – as it is the Apache Software Foundation that licenses the Hadoop software -- Google decided to offer what it calls an Open Patent Non-Assertion (OPN) Pledge
. What does this mean? What Google is saying is that it will not take legal action against users or developers who use these 10 MapReduce patents. But there are several caveats that Google points out, and a pledge by Google is not exactly binding in law.
What does that mean for those who use and develop Hadoop for Big Data? One result is that many people are starting to look at alternatives where the licensing is more clear and they don’t have to rely upon a non-binding pledge from Google that it won’t sue them. Are they being alarmist? Should we rely on Google’s good faith? The pledge has about the same weight and value as other pledges made by Google over the years. In other words, Google will continue to do what Google thinks is in its self-interest -- not necessarily worrying about the interests of the Open Source community. At any time, Google could flex its muscles and take legal action against any company that uses Hadoop and the MapReduce patents. And if a company starts to rely upon Hadoop to process Big Data, they could be putting their business at risk. Currently, Hadoop is the most popular framework for performing Big Data processing – and the one generating most of today’s buzz. But it’s not the only solution out there. Other proprietary and open source solutions don’t use MapReduce for Big Data processing. Here are a few examples that are gaining popularity:
- SAP HANA: An in-memory data platform for performing real-time analytics, and developing and deploying real-time applications.
- Storm: A distributed and fault-tolerant real-time computation system. Similar to the set of general primitives Hadoop provides for doing batch processing.
- Spark: From UC Berkeley’s AMPLab, Spark is an in-memory parallel processing framework that’s comparable to Hadoop MapReduce, except it is up to 100 times faster.
While these alternatives are at different levels of software maturity (and cost), none come with the specter of changes in Google’s approach to its MapReduce patents. With that, many companies who are serious about using Big Data as a core of their business are starting to look at them. So, is Hadoop THE Answer to Big Data processing? Apparently not.