It’s one thing to store and analyze massive amounts of corporate data—but how to do also ensure that data remains secure? Data-security firm Dataguise is offering a solution to that conundrum with DG for Hadoop v4.3, which can mask and encrypt sensitive data within major Hadoop distributions. The software also offers features such as the ability to search out sensitive data within unstructured files, as well as detailed audit reporting (always important when trying to stay in compliance). Apache Hadoop is an open-source framework for crunching gargantuan amounts of data stored on large hardware clusters. Its popularity—firms ranging from multinational corporations to tinier startups have embraced it as an ideal way for handling their unstructured data—has led a variety of tech companies to build related products. Last week, for example, MapR technologies and Canonical announced a partnership to bring Hadoop to Ubuntu, the popular Linux-based operating system. "With Hadoop deployments projected to grow in an upward direction for the foreseeable future, the threat to organizations that do not adopt a comprehensive approach to securing this data remains high," Dataguise CEO Manmeet Singh wrote in a statement. DG for Hadoop utilized symmetric key-based data encryption (and encrypts the data-keys themselves, to boot). With regard to data discovery, the platform employs what Dataguide describes as a “neural-like” network approach to hunting for sensitive data, rather than a rules-based approach. “As a result, information surrounding a given string is correlated and complex inferences are made to determine whether that string is relevant to the search,” is how the company described it in a statement. In addition, the platform can mask data across either single or multiple Hadoop clusters—something the company claims will preserve the “analytical value of information” for later analysis. As Hadoop’s popularity increases, so does the need for boosted security. Despite that need, many companies simply don’t list “security” as a pressing challenge when it comes to actually deploying Hadoop: when asked about the biggest obstacles to Hadoop analytics, many respondents in a recent survey by Dimensional Research (and sponsored by RainStor) cited the fact that Hadoop wasn’t real time (37 percent), followed by concerns over the time needed to put a Hadoop platform into production (26 percent). Almost a quarter cited manual coding as a substantial challenge, while 18 percent believed the cost of training and services as a big obstacle. Security was on companies’ minds, but crowded in with dozens of other potential issues. If Dataguise’s new software is indicative of a trend toward further locking down Big Data, it could give those companies a few less things to worry about.   Image: Maksim Kabakou/