Talks: Testing Data Pipelines

Saturday - May 18th, 2024 3:15 p.m.-3:45 p.m. in Hall C

Presented by:

Description

Hello 👋!

I'll review few great ways to test data pipelines in this talk. This approach's primary goal is to ensure data flows smoothly through the pipelines by quickly identifying and fixing any problems. While the talk uses Airflow as the base, the techniques presented are toolkit-agnostic.

When it comes to testing pipelines, the process is similar to testing software applications. It includes running unit tests for each pipeline component, integration tests for the entire pipeline, and end-to-end tests to ensure accurate data output. However, I'll also discuss unique methods like data snapshot testing and online and offline data quality checks.