For most businesses, data analytics presents an opportunity. But for DARPA, the military agency responsible for developing new technology, so-called “Big Data” could represent a big threat. DARPA is apparently looking to fund researchers who can “investigate the national security threat posed by public data available either for purchase or through open sources.” (Hat tip to Foreign Policy for the link.) That means developing tools that can evaluate whether a particular public dataset will have a significant impact on national security, as well as blunt the force of that impact if necessary. “The threat of active data spills and breaches of corporate and government information systems are being addressed by many private, commercial, and government organizations,” reads DARPA’s posting on the matter. “The purpose of this research is to investigate data sources that are readily available for any individual to purchase, mine, and exploit.” The agency seems particularly freaked out about corporations and governments accidentally leaking data that can be leveraged for identity theft, citing Netflix’s 2009 contest to improve its movie-selection algorithms. “An unintended consequence of the Netflix Challenge was the discovery that it was possible to de-anonymize the entire contest data set with very little additional data,” the posting adds. Personalizing the contest data led to a lawsuit and Netflix canceling future challenges—but in a different context, that sort of data manipulation could have far worse consequences for privacy and security: “The purpose of this topic is to understand the national level vulnerabilities that may be exploited through the use of public data available in the open or for purchase.” In Phase I of the proposed project, researchers will investigate public data across several domains, such as Websites and social networks, and develop “a set of risk factors for vulnerability.” The end result will be a tool capable of measuring the “risk inherent in data.” Those researchers who manage to prove the feasibility of Phase I will move to Phase II, the development of a “proof-of-concept system that can automatically sample data from numerous sources, characterize the data, and provide automatic feedback on the measurable risk.” Phase III involves the researchers deploying a tool “into a near-real-time environment” that monitors open-source data, evaluates vulnerabilities, and “provides defensive countermeasures.” This part of the project will also involve developing capabilities “to defend against threats due to the proliferation of purchasable or public data sets.” As Foreign Policy points out, there’s a certain amount of irony in the government soliciting ways to reduce its vulnerability to data exploitation, considering recent revelations about the extent of the NSA’s surveillance programs. Those programs reportedly involve the NSA vacuuming up enormous amounts of phone metadata, emails and other communications, and running it all through sophisticated analytics tools capable of drawing connections between disparate data-points. “At the time government officials are assuring Americans they have nothing to fear from the National Security Agency poring through their personal records,” Foreign Policy wrote, “the military is worried that Russia or al Qaeda is going to wreak nationwide havoc after combing through people's personal records.” For any data researcher interested in helping the government through its time of trouble, DARPA’s solicitations for the project close Sept. 25.   Image: Maksim Kabakou/