Kouluttaja: Mike Ferguson
Enterprise DataOps - Curating Trusted Data as a Service from Data Lake to Data Marketplace
- 25.03.2019 - 26.03.2019 Helsinki 1 900 € + alv
Most organisations today are dealing with multiple silos of information. These include cloud and on-premises based transaction processing systems, multiple data warehouses, data marts, reference data management (RDM) systems, master data management (MDM) systems, content management (ECM) systems and, more recently, Big Data NoSQL platforms such as Hadoop and other NoSQL databases. In addition, the number of data sources is increasing dramatically, especially from outside the enterprise. Given this situation it is not surprising that many companies have ended up managing information in silos with different tools being used to prepare and manage data across these systems with varying degrees of governance. In addition, it is not only IT that is now integrating data. Business users are also getting involved with new self-service data preparation tools. The question is, is this the only way to manage data? Is there another level that we can reach to allow us to more easily manage and govern data across an increasingly complex data landscape consisting of multiple data stores?
This 2-day seminar looks at the challenges faced by companies trying to deal with an exploding number of data sources, collecting data in multiple data stores (cloud and on-premises), multiple analytical systems and at the requirements to be able to define, govern, manage and share trusted high-quality information in a distributed and hybrid computing environment. It also explores a new approach of how IT data architects, business users and IT developers can collaborate together in building and managing a logical data lake to get control of your data. This includes data ingestion, automated data discovery, data profiling and tagging and publishing data in an information catalog. It also involves refining raw data to produce enterprise data services that can be published in a catalog available for consumption across your company. We also introduce multiple data lake configurations including a centralised data lake and a ‘logical’ distributed data lake as well as execution of jobs and governance across multiple data stores. It emphasises the need for a common collaborative approach to governing and managing data of all types.
Attendees will learn:
- How to define a strategy for producing trusted data as-a-service in a distributed environment of multiple data stores and data sources.
- How to organise data in a centralised or distributed data environment to overcome complexity and chaos.
- How to design, build, manage and operate a logical or centralised data lake within their organisation.
- The critical importance of an information catalog in understanding what data is available as a service.
- How data standardisation and business glossaries can help make sure data is understood.
- An operating model for effective distributed information governance.
- What technologies and implementation methodologies they need to get their data under control.
- How to apply methodologies to get master and reference data, big data, data warehouse data and unstructured data under control irrespective of whether it be on-premises or in the cloud.
STRATEGY & PLANNING
This session introduces the data lake together with the need for a data strategy and looks at the reasons why companies need it. It looks at what should be in your data strategy, the operating model needed to implement, the types of data you have to manage and the scope of implementation. It also looks at the policies and processes needed to bring your data under control.
INFORMATION PRODUCTION METHODOLOGIES
Having understood strategy, this session looks at why information producers need to make use of multiple methodologies in a data lake information supply chain to produce trusted structured and multi-structured data for information consumers to make use of to drive business value.
DATA STANDARDISATION. THE BUSINESS GLOSSARY AND THE INFORMATION CATALOG
This session looks at the need for data standardisation of structured data and of new insights from processing unstructured data. The key to making this happen is to create common data names and definitions for your data to establish a shared business vocabulary (SBV). The SBV should be defined and stored in a business glossary and is important for information consumers to understand published data in a data lake. It also looks at the emergence of more powerful information catalog software and how business glossaries have become part of what a catalog offers.
ORGANISING AND OPERATING THE DATA LAKE
This session looks at how to organise data to still be able to manage it in a complex data landscape. It looks at zoning, versioning, the need for collaboration between business and IT and the use of an information catalog in managing the data.
THE DATA REFINERY PROCESS
This session looks at the process of refining data to get produce trusted information.
REFINING BIG DATA & DATA FOR DATA WAREHOUSES
This session looks at how the data refining processes can be applied to managing, governing and provisioning data in a Big Data analytical ecosystem and in traditional data warehouses. How do you deal with very large data volumes and different varieties of data? How do you load and process data in Hadoop? How should low-latency data be handled?
INFORMATION AUDIT & PROTECTION – THE FORGOTTEN SIDE OF DATA GOVERNANCE
Over recent years we have seen many major brands suffer embarrassing publicity due to data security breaches that have damaged their brand and reduced customer confidence. With data now highly distributed and so many technologies in place that offer audit and security, many organisations end up with a piecemeal approach to information audit and protection. Policies are everywhere with no single view of the policies associated with securing data across the enterprise. The number of administrators involved is often difficult to determine and regulatory compliance is now demanding that data is protected and that organisations can prove this to their auditors. So how are organisations dealing with this problem? Are the same data privacy policies enforced everywhere? How is data access security co-ordinated across portals, processes, applications and data? Is anyone auditing privileged user activity? This session defines this problem, looks at the requirements needed for Enterprise Data Audit and Protection and then looks at what technologies are available to help you integrate this into your data strategy.
This seminar is intended for business data analysts doing self-service data integration, data architects, chief data officers, master data management professionals, content management professionals, database administrators, big data professionals, data integration developers, and compliance managers who are responsible for data management. This includes metadata management, data integration, data quality, master data management and enterprise content management. The seminar is not only for ‘Fortune 500 scale companies’ but for any organisation that has to deal with Big Data, small data, multiple data stores and multiple data sources. It assumes that you have an understanding of basic data management principles as well as a high level of understanding of the concepts of data migration, data replication, metadata, data warehousing, data modelling, data cleansing, etc.