The R programming language is fast becoming the lingua franca of data analysis

By Doug Bartholomew | February 2009 (TV April 2009)


If you have a crystal ball to see into the near future of computing, there's a good chance you see whole lot of R. As the use of open source software becomes more prevalent in business, government and academia, the R programming language is fast becoming the platform of choice for  statisticians, data analysts and research scientists. The adoption of R has snowballed to  the point where the estimated number of users ranges from 1 million to 2 million in business and academia. "People use R to analyze and manipulate data," says Colin Magee, vice president of sales  and marketing at Revolution Computing, a software firm in New Haven, Conn. "They use it to analyze chemical compounds and perform risk analytics and predictive  analytics." Companies using R include many of America's top firms, including Google, PfizerBank of America, Merck, Shell, and the InterContinental Hotels Group. Why is R suddenly so de rigeur for the data crunching crowd? For one thing, it's easy to use. You don't have to be a programming whiz to work with it. And R, which is open-source, is free  - the underlying software code can be downloaded at no cost. Finally, R contains "all the  underlying modeling techniques you use for predictive analytics," Magee explains. "And  it's easy to fold data into R, to analyze it, and to output it." That doesn't mean everybody knows how to use R, though. In fact, college graduates as  well as finance and life sciences professionals who can wield R handily may have a leg up on those who don't. "Absolutely, these institutions that use R will want people with R skills," says Magee, whose company offers support as well as its own high-performance version of the  software called Revolution R Enterprise. "As commercial enterprises, particularly those in  the financial industry and life sciences, establish more work in this language, they will  need the talent with this knowledge." Magee believes the use of R is likely to continue to expand rapidly in corporations,  nonprofits, and other organizations that need extensive data analysis. "As college  graduates emerge with R skills and enter industry, it's inevitable that the growth and use  of this language will continue," he argues. Interest is currently growing in other fields, too, noticeably high-tech biological  applications, and finance and economics, adds John M. Chambers, a former Bell  Labs researcher and now a consulting professor of statistics at Stanford University. Chambers says it's standard practice in Stanford's statistics department to use R for class  demos, and have students use it to perform exercises. "I think a large fraction of students and recent graduates with statistics training worldwide will have experience with R," he says. As is the case with other open-source software programs, R isn't totally free. The reason is companies often need support to tweak the software to meet their specific needs. They  may require expertise to install the software, customize it, and connect it with other  systems. In other words, many organizations that avail themselves of R typically need statisticians or data analysts handy with R to change the underlying code to adapt it to  their business. Regarding training, most people who use R professionally either learned it in statistics class in college, or picked it up on the job. "I would guess that a large fraction of users just 'pick it up,' perhaps going on to a book or an online tutorial later," Chambers says. Doug Bartholomew is a California-based business and technology writer.