New Big Database Technologies; A Market Overview of Technologies and Products

Esittely

Introduction

With the introduction of big data and cloud platforms, a tsunami of new technologies and products for data storage, processing, and analytics has been introduced. Hadoop, Spark,  NoSQL, NewSQL, triplestores, SQL-on-Hadoop are just a few of the countless technologies that have become available for developing big data systems. But also so many new powerful database engines have entered the market, including Amazon Athena, Exasol, Google BigQuery, Microsoft Synapse, MongoDB, Neo4j, SingleStore, SnowflakeDB, Splice Machine, and Starburst.

Most organizations have many questions. How mature are all these new technologies? Are they worthy replacements for the more traditional SQL products? How should they be incorporated in existing data warehouse architecture? Should they be used to develop data lakes? Are they the perfect platforms for data science, or for operational BI?

This seminar gives a clear, extensive, and critical overview of all the new key technologies for storing, processing, and analyzing big data. Technologies are explained, market overviews are presented, strengths and weaknesses are discussed, and guidelines and best practices are given. It’s the perfect update for those interested in the new market of big data technology.

Subjects

1. Big Data: State of the art

  • What exactly do we mean with big data?
  • The key application area of big data: business analytics
  • Differences between semi-structured, poly-structured, multi-structured, and unstructured data

2. Analytical SQL Database Servers

  • Classification of analytical SQL database servers, and can they compete with NoSQL products?
  • The advantages and disadvantages of column-based database servers
  • How important is in-database analytics?
  • Is loading databases into internal memory the solution? Is it feasible?
  • Market overview, including Amazon Athena, Exasol, Google BigQuery, HP/Vertica, Microsoft Synapse, SnowflakeDB, Splice Machine, and Starburst.

3. The World of Hadoop and Spark

  • The Hadoop stack explained: HDFS, MapReduce, Spark, Hive, HBase, YARN, ZooKeeper, Pig, HCatalog, and so on
  • Characteristics and consequences of HDFS and file formats
  • Alternative implementations by MapR, Amazon, and ScaleOut (Hadoop in-memory)
  • Kafka for fast messaging

4. NoSQL Database Stores

  • Classification of NoSQL products: key-values stores, document stores, column-family stores, and graph data stores
  • It’s all about data scalability and performance
  • Why is schema-on-read more flexible than schema-on-write?
  • Are NoSQL products really database servers?
  • Market overview, including Apache HBase and CouchDB, Cassandra, Cloudera, DataStax, InfiniteGraph, MongoDB, and Neo4J

5. Exploring Data in Hadoop Using SQL

  • Making Hadoop data available for reporting and analysis through SQL-on-Hadoop engines
  • Examples of SQL-on-Hadoop engines, including Apache Drill, Apache Hive, Apache Phoenix, Cloudera Impala, HP Vertica, Pivotal HawQ, Singlestore, Spark SQL and Splice Machine
  • Data virtualization for unleashing the information hidden in NoSQL and SQL systems

6. NewSQL database servers for transaction workloads

  • NewSQL database servers are designed for high-performance transactional systems
  • Simpler transaction mechanisms
  • The challenge of multi-table joins
  • Market overview, including CitusDB, Clustrix, MariaDB, NuoDB, and VoltDB

7. Concluding Remarks

What You Will Learn:

  • Why traditional database technology is not “big” enough
  • How different are Hadoop and NoSQL form traditional technology
  • How new and existing technologies such as Hadoop, NoSQL, and NewSQL can help develop BI and big data systems
  • How to embed Hadoop technologies in existing BI systems
  • How Spark can boost performance for analytics
  • How to distinguish between three NoSQL subcategories: key-value, document, and column-family stores
  • Why graph databases are very different from all other systems
  • When to use NewSQL or NoSQL for developing transactional systems
  • How to simplify data access through SQL-on-Hadoop engines
  • When to use which new data storage technology and the pros and cons of each solution
  • Which products and technologies are winners and which are losers

Geared to: IT architects; database specialists; big data specialists; BI specialists; data warehouse designers; technology planners; technical architects; enterprise architects; IT consultants; IT strategists; systems analysts; database developers; database administrators; solutions architects; data architects.

 

+ Lue koko esittely

Kouluttaja:

RICK VAN DER LANS

Rick van der Lans is a highly-respected independent analyst, consultant, author, and internationally acclaimed lecturer specializing in data architectures, data warehousing, business intelligence, big data, and database technology. In 2018 he was selected the sixth most influential BI analyst worldwide by onalytica.com.

He has presented countless seminars, webinars, and keynotes at industry-leading conferences. He also helps clients worldwide to design their data warehouse, big data, and business intelligence architectures and solutions and assists them with selecting the right products.

Lue lisää

New Big Database Technologies; A Market Overview of Technologies and Products

Teema:
Datan hallinta
Kouluttaja:
RICK VAN DER LANS
Kieli:
English
Kesto:
1 päivä
Paikka:
Etäkoulutus
Aloituspäivämäärät:
Ota yhteyttä

Koulutusohjelmalla / kurssilla ei ole aktiivisia aloituspäivämääriä, jos olet kiinnostunut kurssista ota yhteyttä.

Ota yhteyttä

Ottakaa yhteyttä:

 

  • Kenttä on validointitarkoituksiin ja tulee jättää koskemattomaksi.

Saattaisit olla kiinnostunut myös näistä

Datan hallinta

Business-Oriented Data Modelling Masterclass

Lue lisää
Tiedolla johtaminen

Practical steps for developing Data Strategy and Governance

Lue lisää
Datan hallinta

Tietovarastointi osana tietohallintoa ja kokonaisarkkitehtuuria

Lue lisää
+