Pentaho’s open-source business intelligence (B.I.) software offers businesses a variety of functionality, ranging from data-mining and reporting to analytics services. SourceForge’s Rich Bowen recently sat down with James Dixon (Pentaho co-founder and CTO) and Doug Moran (co-founder and Big Data Product Manager) to discuss the project and the developer community. A podcast of the following chat is also available. Without further ado: Bowen: A number of times that I’ve worked with Pentaho in various businesses, the same question comes up repeatedly: what’s with the name? Tell us where the name came from. Moran: The hardest part of doing a startup is coming up with a decent name. That was actually even harder than the idea or the actual code. Believe it or not, we did spend a lot of time on it. The main thing we were looking for was something that you could Google, and there was absolutely nothing returned. We went through a list of tons of things. Would it pigeonhole us into a specific product? Would it, over time, sound dated? So we decided we needed to come up with something that really didn’t mean a whole lot, but sounded like it meant something. Dixon: And something where, no matter how you pronounced it, someone would know how to spell it. Moran: There were five of us that started the company. Somewhere along the line we got locked into “Penta.” The “Penta” was for five, and “Ho,” well, that just sounded right. There’s a story on our FAQ that listeners may be interested in. We did come up with a completely different story to explain the origin of “Pentaho,” and actually had a few people fall into the trap. In fact, the first article in a major press mentioned that we were named after a Florida Indian tribe, as our story alludes to. But if you actually read the complete story you would realize that that may not actually in fact be the truth. But we’ve caught some people with that. It’s kind of fun. One of the turning points early on for us was when you would Google “Pentaho,” and it stopped saying, “Did you mean pentagon?” Once there were enough links coming back to us to where we were a legitimate name, we knew we’d gotten somewhere. Bowen: Pentaho has been around for quite a while. I’ve used it at a number of jobs that I’ve had. So it’s probably familiar to many of our audience, but could you give us an overview of what the project does? Dixon: Sure. Absolutely. Pentaho is actually a collection of projects. We’ve got Mondrian, which is a relationally based OLAP engine - so it’s an engine for slicing and dicing multi-dimensional data. We’ve got Pentaho Reporting, which provides web-based and desktop reporting. Outputs PDF, Excel, CSVs, HTML, etc. We’ve got Kettle, or Pentaho Data Integration, which is a data transformation engine. We’ve got Weka, which is a machine learning engine for doing particular analytics, and those kinds of things. Sentiment analysis. And then tying it all together, we’ve got a B.I. platform that integrates all the pieces together, flows data between the different engines, provides web-based user interfaces, etc. So it’s a whole suite that’s all tied together by a platform. Bowen: So, how does this work so far as the development community and whatnot? Are they developed as separate projects, or is it one unified developer community, or how does that work? Moran: Since the projects actually were developed at various times by different people, we merged them when we started Pentaho. So, Mondrian is its own separate SourceForge project, jFreeReport, which is the basis for Pentaho Reporting, is also its own SourceForge project, as is Weka. When we first started Pentaho, our idea was not necessarily to build the entire B.I. suite, but to look what was available in Open Source, and then build a platform that would unify those projects. So, over time, as we got to know the architects and the leads of the other projects, they would join Pentaho as employees, and we’d take over ownership of the project. Bowen: I’ve interviewed a number of projects where the project came out of the company, either because they developed something in-house and then Open Sourced it, or because the entire organization was built around an idea that was Open Source. This is very much the other way around. I’d like to hear more about how this works. The people that work on these various projects—are they all employees of your company? Moran: The chief architects of each of those projects are. They also have their own community members that are independent, as Open Source projects typically are. The projects themselves—so, like, Mondrian is an embeddable OLAP engine—there are other people that use it and embed it. There’s no friction between its Open Source-ness, and the fact that we use it for our embedded engine. The best part of that for us is we have the architect, we can help shape the roadmap, and all kind of work together for integration. Bowen: Now, you have an enterprise edition of some of these products, is that right? Dixon: Yes, that’s correct. Bowen: What’s the relationship between the Open Source version, the enterprise version, your employees, and the community? Dixon: The community edition is designed to be a platform that you can build business intelligence, business analytics projects on top of whether it’s reporting: data integration, machine learning slicing and dicing, whatever. So if you look at our usage worldwide, and Doug’s got figures he can give you, we’ve got people installing and using our software in 180 countries worldwide. And there’s no business intelligence vendor—IBM, Oracle, Microsoft—none of these companies have sales and services for business intelligence in that many countries. So our community edition is full-featured. People all over the world using it to develop projects. The enterprise edition adds features on top which large IT shops will be expecting out of a business intelligence suite. So we’ve got things in there for administration, for maintenance, we’ve got features in there that lower the cost of ownership. There are some U.I. bells and whistles in there, in the enterprise edition. We have a WYSIWYG, drag-and-drop slice and dice user interface. But our community edition has something analogous. In the enterprise edition there’s an ad-hoc interactive reporting user interface, but the community edition has something… it’s not as sexy, it’s not as fun to play with, but the basic functionality is also there in the community edition. Some people don’t like our business model. They say our community edition is cut down, it’s demoware. They’ve obviously never used it, because that’s not the way our business model actually works. We can’t be successful if our community edition isn’t something that you could actually use in practice. Moran: The last… I think about the last six developers we hired—and every one of them were top-notch—came from the community. So these are guys that were contributing, working for business intelligence companies, or system integrators, or consultants, that just got really good with our stuff, and either asked, or we offered, a job. So we picked up a lot of tremendous talent from the community base. Bowen: That’s really a great thing to hear. I often encourage people to participate in Open Source, very self-servingly as “resume fodder,” but also because it expands your expertise in so many different areas so greatly. Moran: Absolutely. And those guys typically have the best work ethic, and really love what they’re doing. But the bottom line for us is, it’s almost like an “America’s Got Talent” kind of deal. We need somebody, we look out there, see who’s really good. That part, when we came into it, we didn’t quite expect so much, that that would be another benefit of Open Source beyond the adoption, and contributions, and plugins and all that fun stuff. Bowen: Your suite of products is extremely full-featured. Like I said, I’ve used it. Where do you go in the future? What do you have planned for upcoming versions? Dixon: One of the things that we feel is important technology-wise about the platform is that it’s very pluggable. So, our data integration engine is pluggable. Our report designer is pluggable. Our machine-learning engine is pluggable. Our business intelligence server is pluggable. So, we’re developing a lot of new things as plugins. We’ve got a couple things that we’re doing with big data. We recently Open Sourced all of our big data componentry. So, working with Hadoop, Cassandra, MongoDB, HBase, etc., all of that, is now in Open Source. We’ve got an effort underway to add more functionality around the big data stuff. In recent versions we’ve added a few more plugin points into our community edition. So we’ve got a new visualization API, so that you can create and plug in new visualizations. And a client-side data access API—it’s an API where visualizations can get to many different data sources on the server. And that data access API is also pluggable, so if you have a custom data source that you want to expose to all of our client-side visualizations, you can create plugins in that area as well. We’re working on a big release for later this year where we’re moving to a CMS-based repository. We’re adding in REST services for everywhere. We’ve had web services before, but we’re adding many more REST web services into the product. So that’s a bit of a re-architecture for our next major release. But as Doug said, we’ve got a very active community because of these plugin systems, so a lot of our code contributions—which is what most people talk about when they talk about Open Source contributions, people primarily tend to focus on code contributions, which I think, in terms of volume is one of the smallest contribution areas for most Open Source projects. If you think about the number of people using Linux vs. the number of people contributing to the kernel, it’s a very, very skewed number. So we get contributions—localization, documentation, QA. And a lot of people just downloading the product and trying it and using it, and maybe finding something in the install guide that isn’t quite right—that’s a contribution. So, we get these contributions in many different forms. I’ve actually got a paper called The Beekeeper Model, which describes our philosophy around the business model in terms of describing it as a bee farm. And in there, we list out over a dozen different ways that you can contribute to Open Source projects in addition to the code. And we’re also adding, to help foster and encourage this development of plugins, we’re rolling out a marketplace where you can discover plugins that are available, and get help downloading and getting support for plugins that you want to try. That aspect of it has been very encouraging. Our dashboard framework was contributed by one of our partners. Our connectivity to things like, SAP, those were contributed by community members. The community includes… it’s not just people who are using the community edition. We get contributions from customers, we get contributions from partners. It’s a model where everyone can be acting purely selfishly, in a self-serving manner, and by doing that, it actually makes everything better for everyone else as well. But it’s such a strong model because you can act purely selfishly, so… I’ve contributed to different Open Source projects. I’ve contributed bug fixes to JBoss, but it was because I didn’t want to have to reapplying my fix. I completely selfishly gave JBoss my code fix, because I didn’t want to maintain it. It’s a really nice model that way. It’s a lot of fun, and it’s very productive compared with the proprietary world we were used to before. Bowen: You said you’re installed in 180 countries. Tell me something about one of your customers, and what they’re doing with your product that’s exciting, that’s ignited your imagination. Moran: One of our fun success stories is a company called Sheetz. They operate about 400 convenience store locations throughout the northeast. They had about a dozen different reporting products and didn’t have anything that really worked all together. They standardized on Pentaho. They rolled it out to their stores, and their people seemed to really like it. We get a lot of good press from those guys. Our platform is OEM-able and embeddable. There’s a company called Marketo. They’re a SAS-based marketing provider. They embed our reporting and analytics right in that product. So that’s another kinda cool usage. And then, we’ve recently got into—about a year and a half ago—into the big data space. There’s a company called ShareableInk. They’re also a Pentaho user and have written case studies about the stuff that they’ve done. So we’ve got a lot of different areas, not just the analytics, but embedded and SAS and big data. Starting out from zero, five of us that started the company back in 2004, with almost no code, and then coming down to now, where these big corporations—we’ve got banks and hospitals, and people that depend on our stuff, and the stuff that we have acquired through the other projects. It’s kind of mind-blowing. Dixon: And also, if you look on the community side, there are interesting projects that are going on worldwide with Pentaho. One of which is OpenMRS, which is a medical records system. It’s an Open Source health care system, that’s used primarily in eight hospitals in Asia and Africa. It’s a project that is stewarded out of the U.S. We’re working with that team to provide better reporting and analysis of the data that they’re collecting. It’s really interesting to see the different use cases and ideas that people have for using the software. Moran: Another thing that I feel is satisfying about working in Open Source is that you get much more direct contact between the software developers working on the code and the people that have the use-case that are actually trying to use the software. In a proprietary company, you’ve got layers of account managers and support people trying to ensure that the engineers don’t actually talk to the customer. Whereas in this model, there’s much more of a direct connection to find out what people are doing and why. I think that’s very satisfying about this business model and about working with Open Source. Bowen: Thank you, James and Doug, for speaking with me.