Incorporating big data, Hadoop, Spark and NoSQL in Data Warehouse


Big data, Hadoop, in-memory analytics, Spark, self-service BI, analytical database servers, data virtualization, and NoSQL are just a few of the many new technologies and tools that have become available for developing BI systems. Most of them are very powerful and allow for development of more flexible and scalable BI systems. But which ones do you pick?

Due to this waterfall of new developments, it’s becoming harder and harder for organizations to select the right tools. Which technologies are relevant? Are they mature? What are their use cases? These are all valid but difficult to answer questions.

This seminar gives a clear and extensive overview of all the new developments and their inter-relationships. Technologies and techniques are explained, market overviews are presented, strengths and weaknesses are discussed, and guidelines and best practices are given.

The biggest revolution in BI is evidently big data. Therefore, considerable time in the seminar is reserved for this intriguing topic. Hadoop, Spark, MapReduce, Hive, NoSQL, SQL-on-Hadoop are all explained. In addition, the relation with analytics is discussed extensively.

This seminar gives you a unique opportunity to see and learn about all the new BI developments. It’s the perfect update for those interested in knowing how to make BI systems ready for the coming ten years.


  1. The Changing World of Business Intelligence
  • Big Data: Hype or reality?
  • Operational intelligence: does it require online data warehouses?
  • Data warehouses in the cloud
  • Self-service BI
  • The business value of analytics
  1. Hadoop Explained
  • The relationship between big data and analytics
  • The Hadoop software stack explained, including HDFS, MapReduce, YARN, Hive, Storm, Sqoop, Flume, and HBase
  • The balancing act: productivity versus scalability
  • Making big data available to a larger audience with SQL-on-Hadoop engines, such as Apache Drill and Hive, Apache Phoenix, Cloudera Impala, IBM BigSQL, JethroData, Pivotal HawQ, SparkSQL, and Splice Machine
  1. Spark Explained
  • Spark is in-memory analytical processing
  • The interfaces: SQL, R, Scala, Python
  • Does Spark need Hadoop?
  • Use cases of Spark
  1. NoSQL Explained
  • Classification of NoSQL database servers: key-value stores, document stores, column-family stores and graph data stores
  • Market overview: CouchDB, Cassandra, Cloudera, MongoDB, and Neo4j
  • Strong consistency or eventual consistency?
  • Why an aggregate data model?
  • How to analyze data stored in NoSQL databases
  1. Overview of Analytical SQL Database Servers
  • Are classic SQL database servers more suitable for data warehousing?
  • Important performance improving features: column-oriented storage, in-database analytics
  • Market overview of analytical SQL database servers, Actian Matrix and Vector, Dell/EMC/Greenplum, Exasol, HP/Vertica, IBM/Pure Data Systems for Analytics, Kognitio, Microsoft, SAP HANA and Sybase IQ, SnowflakeDB, Teradata Appliance and Teradata Aster Database
  1. Streaming Database Servers
  • What are streaming database servers, and why are they different from messaging products, such as Apache Kafka?
  • Streaming database servers support analytics at the speed of business
  • Different forms of operational BI: operational reporting, operational analytics, and embedded analytics
  • Market overview: Cisco ParStream, SQLStream Blaze, StreamBase
  1. NewSQL databaseservers
  • NewSQL stands for high-performance transactional SQL database servers
  • Simpler transaction mechanisms to implement scale-out
  • What does the term geo-compliancy  mean?
  • Market overview: Clustrix, GenieDB, MariaDB, NuoDB, Splice Machine, and VoltDB
  1. Incorporating Big Data Technology in BI Systems
  • What re the use cases of Hadoop in classic data warehouse architectures?
  • Using streaming database servers for real-time analytics
  • What could be the role of NoSQL products?
  • Using Spark as performance booster for data marts
  1. Closing Remarks

Learning Objectives

In this seminar Rick van der Lans answers the following questions:

  • Learn about the trends and the technological developments related to business intelligence, analytics, data warehousing, and big data.
  • Discover the value of big data and analytics for organizations
  • Learn which products and technologies are winners and which ones are losers.
  • Learn how new and existing technologies, such as Hadoop, NoSQL and NewSQL, will help you create new opportunities in your organization.
  • Learn how more agile data business intelligence systems can be designed.
  • Learn how to embed big data and analytics in existing business intelligence architectures.


+ Lue koko esittely

Incorporating big data, Hadoop, Spark and NoSQL in Data Warehouse

2 Päivää
Paasitorni, Paasivuorenkatu 5, Helsinki
Ota yhteyttä

Koulutusohjelmalla / kurssilla ei ole aktiivisia aloituspäivämääriä, jos olet kiinnostunut kurssista ota yhteyttä.

Ota yhteyttä

Ottakaa yhteyttä:


  • Kenttä on validointitarkoituksiin ja tulee jättää koskemattomaksi.

Saattaisit olla kiinnostunut myös näistä

Datan hallinta

Business-Oriented Data Modelling Masterclass

Lue lisää
Tiedolla johtaminen

Practical steps for developing Data Strategy and Governance

Lue lisää
Datan hallinta

Data Management Fundamentals and DAMA Certification Preparation

Lue lisää