It’s one thing for marketers to talk about “the flood of data.” It’s another thing for tech professionals such as data scientists to face what’s better described as a “deluge.” Whether in production, manufacturing, marketing, HR applications, or customer service, it’s clear that the systems that manage and analyze information are bursting with all kinds of structured and unstructured data.

And that data isn’t just growing in volume, but also complexity. In the context of HR, not only does each employee or contingent worker bring their own profile information—such as salary, benefits, year hired and certifications—to the table, they also continuously generate new data, including hours worked, projects completed, and performance-review results.

And beyond volume and complexity, there’s also the deceptively simple question of format. Not all data is simply binary: More and more, systems must be able to scan and parse free-text information where terminology may vary from user to user.

“We went from an era where we had no insights to today, where we’re pushing out a massive number of insights,” said Don Weinstein, corporate vice president of global product and technology at ADP. “It’s like taking a starving person and bringing them to a buffet. People can get frozen, and not know where to start.”

Wider and Deeper

The volume of complex data allows organizations to capture more information from more sources, and to analyze it in more ways for more purposes. Consider a company with 5,000 employees, suggests John Sumser, principal consultant of the Bay Area advisory firm HRExaminer. Today, it may apply 20 models to each individual, and another several hundred to each department. But as system capabilities expand, the impact on data science and technology teams will become “extraordinary,” he said.

For one thing, Sumser continued, large organizations will run their models on what he calls “data model farms” that hold all of their operating data. As old technology reaches capacity and new platforms arrive with boosted capabilities, technical teams will have to develop new data models while continuing to run existing ones. He likens the challenge to changing a tire while the car’s still moving. “You have to be assessing the health and configuration of all those things at the same time,” he said.

To get a sense of the scale involved, consider ADP. According to Weinstein, the company today stores over 3 petabytes of workforce data, and that dataset is continually growing. The company processes payroll for 26 million customer employees in the U.S., usually every other week—or 25 times a year. “You’re talking about 650 million paychecks and each paycheck itself is a treasure trove of data,” Weinstein observed. “And that’s just the starting point.”

Data Nuts and Bolts

Managing the artificial intelligence behind such systems will be a major aspect of coping with this deluge, Sumser believes. However, he anticipates technologists and data professionals will run into a number of operational pitfalls at the same time. For example, providing recruiting experts with the best technology will be especially challenging when the types of data and systems they’ll need are moving targets.

And, inevitably, business conditions will change, which in turn will affect spending. One possible scenario: A company’s sales drop, so it must stop spending on a predictive analytics system its operations have come to depend on. When something like that happens, “you’ll have set up a situation in which the company is vulnerable,” Sumser said. The data staff will have created a certain level of expectations that they can’t afford to sustain, and made promises they can’t keep. Then “there’s going to be hell to pay, and that’s going to happen to a lot of people,” Sumser observed.

Meanwhile, data is only becoming more complicated. For example, while the amount of payroll data ADP works with is huge, most of it is “countable” and “binary,” Weinstein said. However, much of the information used in workforce management isn’t so straightforward: “Performance ratings, goal attainment and things of that nature are messy.” 

Messy data can be dangerous. Unwittingly or not, without the right expertise in place, it can be misused. “Just because I put a survey out there and grab a whole bunch of responses to the survey, it doesn’t actually mean anything,” Weinstein said. In fact, he believes that many false claims are made “because you have inexperienced people using data instruments in an improper fashion.”

That leads to another challenge to using workforce data effectively, according to Weinstein: separating signal from noise. “There’s just so much noise and there’s very little true, true signal,” he said. “You have all of these different surveys that are being pumped out, being thrown at you, and it’s just a bunch of qualitative stuff masquerading as quantitative, verifiable data that has no real backing or basis in science or scientific discovery. That’s not good.”