WA3219
Fundamentals of DataOps Training
This course introduces you to DataOps basics, including its origins, components, real-life applications, and ways to implement it.
Course Details
Duration
1 day
Prerequisites
General knowledge of programming and data processing.Target Audience
- Data and Business Analysts
- Information Architects
- Technical Managers
Skills Gained
- Understand the fundamentals of DataOps and how people collaborate to deliver data for specific purposes.
- Learn the three standard DataOps pipelines (Production, Development, and Environment) and how to orchestrate the necessary teams, tools, and processes
- Test, measure, and iteratively improve DataOps production pipelines
- Structure the development pipeline to fit the development lifecycle and use it to achieve fast deployments
- Build environment pipelines, manage components, and adapt the environment to different use cases
- Apply Lean DataOps to improve your organization’s data operations
Course Outline
- DataOps Introduction
- Data Analytics On the Run
- Impediments to the Data Analytics Cycle Time
- Finding a Solution ...
- What is DataOps?
- Agile Development ...
- DevOps
- The DataOps Technology and Methodology Stack
- The DataOps and Data Science Relationship
- DataOps Relationships with Other Data Management Disciplines and Concerns
- Standing Up a DataOps Practice
- The Lean Manufacturing Methodology
- Statistical Process Control
- What is Six Sigma?
- DataOps Enterprise Data Technologies
- The DataOps Manifesto
- Problems that DataOps Solves
- DataOps Leadership Principles
- The DataOps Problem Domain
- Connecting to the Digital Realm ...
- Data is King
- Actionable Insights
- Snowflake Environments
- Data Observability
- Cloud Resource Monitoring Dashboards
- Fragmented Data Sources
- Data Formats
- Interoperable Data
- The Data-Related Roles
- What is Data Engineering
- The Typical Data Analytics (Machine Learning) Pipeline
- IT Systems' Woes
- Types of Architecture
- How to Lead with Data (the "Fidelity Way" *)
- How to Lead with Data: Ownership
- How to Lead with Data: Shared Environment Security Controls
- How to Lead with Data: the Current Trends
- DataOps Functional Architecture
- Key Components of a DataOps Platform
- Automation
- Maintenance
- DataOps Data Pipelines
- Building Pipelines: Aggregating System DAGs
- Distributed Data Flow Challenges
- Promoting Teamwork
- The Tragedy of the (Unmanaged) Commons
- Tests in Data Analytics
- Test Types
- The Netflix Simian Army Test Suite
- Input Data "Irregularities"
- Dealing with Missing Data in Python
- DataOps Technology and Tools
- Data Storage System Types
- The CAP Theorem
- The CAP Triangle - Which Storage System to Choose
- Mechanisms to Guarantee a Single CAP Property
- Data Physics (a.k.a Distributed Data Economics)
- Hadoop: Example of Collocating Data and Computation
- An Example of Hive DDL
- Efficient Storage with Columnar Formats
- Example: AWS Athena Storage and Processing Cost Savings
- Example: Converting the CSV Data Format into Parquet Using HiveQL CTAS Statement
- The Cloud: Value Proposition
- Lessons from the Field
- Design for System Resiliency
- How eBay Preempts Possible Database Corruption
- Cloud Data Services
- The Cloud Strategy
- Virtualization
- Virtualization Benefits
- What is Docker
- What is Kubernetes
- Computing Services in the Cloud
- Get Educated ...
- "Good/Not so Good" Use Cases for the Cloud
- Infrastructure as Code (IaC)
- Example of Provisioning and Running a PostgreSQL Database in Docker
- IoC Systems and Tools
- Workflow (Pipeline) Orchestration Systems
- Example of a Workflow Orchestration System: Apache NiFi
- NiFi Processor Types
- Building a Simple Data Flow in the NiFi Designer
- An Annotated Example of Using scikit-learn Python Machine Learning (ML) Pipeline Class
- Version Control Systems
- Branching and Merging Visually
- Some Popular Version Control Systems
- Overview of DataOps Tools and Services
- IT Governance
- IT Governance
- Data Governance
- Controlling the Decision-Making Process
- Enterprise IT Governance Models
- Key Artifacts
- Agile IT
- Types of System Requirements
- Scoping Requirements
- Requirements Gathering ...
- Data Governance Overview
- Data Governance Roles and Responsibilities
- Roles and Responsibilities in DataOps
- Example of Assigning Responsibilities (AWS Shared Responsibility Model)
- Example of a Governance-Enabling Service
- Governance Best Practices
- Governance Gotchas
- The Goldilocks Principle
Lab Exercises
- Lab 1. Data Availability and Consistency