Extracting Gold from the Big Data Ecosystem
[caption id="attachment_935" align="aligncenter" width="518" caption="This isn't a tablet. It's a shovel for helping dig insights from a big pile of data."] [/caption] The hype over Big Data has reached a fever pitch, and yet the misconceptions surrounding it are still prevalent. Most organizations view Big Data as a lump of petabytes with a couple powerful tools that can be used to generate meaningful business insights. Not so. The Big Data ecosystem is a robust and thriving space comprised of complex, interconnected layers. Rather than a silver bullet that cures your business insight deficiencies, Big Data solutions are best compared to gold mining, where it is necessary to sift through a lot of worthless sediment before a substantial amount of gold is unearthed. Furthermore, those raw materials need to be further processed and refined by the trained hand of a skilled jeweler before the gold has monetary value. Big Data is similar, as it requires enterprises to undergo several steps and processes before extracting the real value or business insights that Big Data evangelists have promised. Step 1: Go Big Data, Young Man, Go Big Data The key to Big Data success is to clearly define the goals of the initiative, as this will help guide what types of data will be needed to nourish analytics models. Raw data may come from text, SMS, video, images, HTML pages and PDFs. But who cares about raw data? The most important question to answer is, “What do you want to do with data?” The Big Data dig officially begins once a goal has been set. Step 2: Choose Your Locations Once the mission has been declared, an organization must begin to locate sources of reliable information that will be analyzed to achieve their Big Data objective. Many make the mistake of relying solely on internal data from email, databases or other internal documents. Business insights relying solely on internal data tell only one side of the story, though. Web data’s multitude of sources and information creates context for business insights that internal information supplements. For instance, let’s pretend that Company A’s Big Data goal is to improve its competitive pricing against its main competitor, Company B. Internal data can show the historical impact of its prices on sales while external Web data can be used to create a snapshot of a competitor’s strategy– i.e. when Company A raised prices $3, Company B raises their prices $2. Step 3: Start Digging This is where the “Big” in Big Data comes into play. Every day, society creates 2.5 quintillion bytes of data—so much that 90 percent of the data in the world today has been created in the last two years alone (source: IBM). This is why it is so important to identify informational sources at the outset of a Big Data initiative, so your organization isn’t attempting to boil the ocean by sifting through information generated by Google or Twitter search results day-in and day-out. Therefore it is in the best interests of an organization to implement accurate and reliable data extraction and monitoring processes as part of their Big Data strategy and infrastructure. It is important to understand the difference between data monitoring and aggregation as they relate to Big Data. Monitoring refers to an enterprise’s ability to identify new information or detect changes to established online platforms such as ReadWriteWeb.com, Auto Finder and the New York Stock Exchange–examples of sources of valuable data. Accurate change detection reduces the amount of data to process and analyze, thereby enabling companies to instantaneously react only to important updates. Data aggregation is a way to efficiently collect desired data from multiple sources. As with monitoring, it is important to continue the sifting process in this step – the goal is not to drown in all of the possible data but to gather as much needed data as possible. Step 4: Polishing the Stones You’re not done with the data just yet. Aggregated information must undergo a few basic processes before it can be analyzed. Some of the most common operations include: - Normalization - De-duplication - Validation - Comparison Step 5: Delivery to the Big Data Jeweler Newly refined data can be fed directly into what I describe as ‘Data Products.’ These are highly technical tools for text, semantic or statistical analysis, visualization and data interpretation. This is the step where the value of data becomes visible. The diversity and sophistication of these tools creates a heaven for Data Scientists. This step is often the final destination for such enterprise Big Data initiatives such as market research, competitive intelligence and financial market analysis. What if you are not a data geek or do not have time to analyze data? What if you are “Joe the Consumer” and want the value of data extracted and delivered to you on a silver platter? Fortunately, there are thousands of consumer applications making that happen. Step 6: Presentation to the Customer “Data-driven applications” can reside on a PC, iPhone or other Web-enabled device to deliver Big Data value directly to the customer. All of us are familiar with applications supporting comparison shopping, generating alerts for price changes, tools for finding that perfect house or analyzing hundreds of product reviews. In fact, most consumers do not even realize that a “simple” application may have to extract and analyze millions of pieces of data to present a single golden nugget. Many of these applications may actually embed third-party “Data Products” to analyze and visualize data. The rapid growth of consumer applications relying on Big Data benefits end users as well as all players in the Big Data space. The path to Big Data glory is paved on these five–or six–steps. Stick to this framework and prepare to discover gold in all of the information that surrounds you. Isai Shenker sets and directs execution of the product strategy for Connotate. Most recently, he was SVP of Product Management at CONTEXTWEB, an integrated digital media services company. Isai’s broad technology and business knowledge is a result of over 20 years of experience in software product development, product management and business development.