I had an opportunity to interview Rick van der Lans, the chairman of the European BI and Data Warehousing Conference, best-selling author and top consultant in his field, about the future trends of BI/DW. He will once again come to Helsinki to run a seminar on September, 9th.
”In general, the Finnish IT-professionals are very up-to-date with their knowledge. Therefore, it’s always a pleasure to do sessions in Finland.”
Ari: Let’s first talk about databases. There are so many database products available today. How to choose the right one?
Rick: Well, that is quite a challenge nowadays. Especially now that many of the new database servers are not really generic database servers, but highly specialized products. They are very good in one or two things, but not everything. Especially for systems that require some massive workload, it’s important that a product is selected that fits that particular workload. So, the bottom line is that if you have to choose one, you must have a very detailed view of your current and future requirements.
A: There are quite many new analytics and data preparation tools on the market. Is the role of the data warehouse going to diminish?
R: It is correct to say that new tools for analytics and data preparation keep on coming. It’s almost like a continuous stream of tools. And some really extend the self-service capabilities for business analysts and data scientists. The challenge is how to support that new investigative, do-it-yourself style of working with a data warehouse that was initially developed for more classic and traditional forms of reporting. My feeling is that this will not diminish the role of the data warehouse, it means that the architecture of our data warehouse environment has to change. And I think that the logical data warehouse has a lot of potential here.
A: In our last interview you told me that big data technologies are quite complex which makes implementation slow. Is there any improvement in this area?
R: That statement is still very true. There is some improvement, for example, in the Hadoop space we have all these SQL-on-Hadoop engines, and for some of the NoSQL products, SQL interfaces are available as well. But still, there are quite technical technologies.
A: Hive and other Hadoop ecosystem products seem to get new features constantly. How mature is the Hadoop ecosystem in your opinion and what are the biggest challenges still left?
R: First of all, you can’t make general statements about the maturity of the entire Hadoop ecosystem. You can make such statements for each module separately. And if we do that, we can say that some of the older modules are now mature and can be used and are being used in mission critical environments. But if we look at some of the new modules, you can’t call them mature yet. For example, the new file system called Kudu is very promising, very powerful, but it’s still young. It still has to proof itself somewhat, although my feeling is that it will definitely do that.
A: When talking with BI/DW-people, many say there is quite a lot to do with basic reporting such as simply pulling off business data from their existing IT-systems. Why should you start investing new technologies or big data if the basics are a mess? If I may, I use an analogy here: You live in a house without electricity and running water, shouldn’t you first have this fixed before you implement your, let’s say, wifi or hifi-systems? In another words, should you first fix the fundamentals before proceeding to fancier technologies?
R: That is always a good point. When your system is a mess, it doesn’t really make sense to replace the technology, unless that technology is causing the problem. But I think that is not the case in most companies that invest in big data technology, because they don’t see it as a replacement, but as technology that offers them new features and capabilities. For example, many new big data systems are developed to improve, deepen, and extend the analytical capabilities of the organization. Others use it to develop new operational systems that were unaffordable to develop with classic technology. To come back to your analogy, in most cases we invest in big data to develop another house.
A: From a business perspective, many people consider Data Warehousing as “old school”, too time consuming, expensive and a forever-lasting exercise that doesn’t pay off? How would you” sell” the idea of data warehousing to this new breed of business leaders who want results in a faster pace?
R: We have to make a distinction here. There are reports that have to developed for large groups of internal and external business users, and for external legislators. Such reports have to be formally tested, they have to be auditable, governed, and have to return reproducible results. And that takes development time. Most data warehouses are developed to support such reports, and they make look “old school”, but they are very important. In addition, we have reports that can be developed quickly, may have a short lifespan, they don’t have to be auditable and formally tested. This is the world of self-service BI, investigative analytics, and data science. And most organization struggle with combining this new form of reporting with their classic data warehouse. So, you don’t “sell” the idea of data warehousing, but you sell the new analytical capabilities that may become available to a large group of users within the organization. Still, that is normally not a simple sell.
A: When I meet IT-professionals, they might say they don’t need big data at the moment. But when you dig deeper and start asking about their data challenges, it turns out that Hadoop is actually something they would benefit from. What would be the right questions to address to that?
R: I think we should not sell big data or big data technology to organizations, but solutions. I think this is a general thing. We have to try to explain what it can mean to an organization. What are the business benefits? But this could mean that we have to think out of the box, and that is not always possible in a short one-hour meeting.
A: What are the most important skills for a data warehouse professional?
R: I would say, that’s the simplest of all the questions you have asked me. The most important skill for the data warehouse professional is to be able to communicate with business users. And to be able to do that you have to understand the business you are working for. If you work in a business intelligence/data warehouse environment, you are operating very close to the core business and management processes. In that case, being an expert in tools is not sufficient.
A: Last few questions: You have been here in Finland many times before and you have also done some in-house consulting for a few companies as well. What is your impression of Finnish IT-professionals?
R: In general, the Finnish IT-professionals are very up-to-date with their knowledge. Therefore, it’s always a pleasure to do sessions in Finland.
A: Apart from you being an inspirational speaker, what would be the top three reasons to sign up for the upcoming event here in Finland called Incorporating big data, Hadoop, Spark and NoSQL in Data Warehouse
R: First of all, this market is moving so fast. If you studied this new world of technology two years ago for the last time, you would be surprised to see now how much has changed since. Second, organizations can really extend the features of their data warehouse/business intelligence environment by deploying some big data technologies. And third, you have to know what’s possible with all this new technology, but just as important is that you also know what’s not possible. What are the limitations and drawbacks of all this new technology?
More information and registration for Rick’s upcoming seminar Incorporating big data, Hadoop, Spark and NoSQL in Data Warehouse here. Make sure you don’t miss it since there are limited number of seats available.