Handling Real Time insights of Delta Live Tables with Change Data Capture in Databricks

Matthew Salminen
4 min readOct 2, 2023
Image Source: https://blog.atawiz.fr/article/pipeline-databricks-deltalake-tables/

Last week I was able to take a deep dive of Delta Live Tables in Databricks on how DLTs can handle your real-time data ingestion. Without going too far in depth, I wanted to explain a useful way of monitoring data changes as your DLTs consistently run with new updates to your records. The way we can monitor these data changes is by using Change Data Capture.

What is Change Data Capture? Change Data Capture or CDC is a process of identifying any data changes made to your data sources and moving those changes to the target. It enables tracking and recording of specific actions of your DLTs, primarily inserts, updates, and deletes to your data.

Using CDC is useful as it monitors any changes to your data between different sources/DLTs and keeps history of these changes over time. CDC works alongside Delta lake and your metadata to ensure all changes are tracked accordingly.

Advantages of using CDC with your DLTs:

  1. Real-Time Insights: CDC will track changes to your data with your DLTs, allowing you make insights on the move whenever and wherever they occur.
  2. Auditing and Compliance: In terms of compliance requirements for your company/business, CDC will keep a record history of all data changes, giving…

--

--

Matthew Salminen
Matthew Salminen

Written by Matthew Salminen

Marathoner | Trail Runner | Data Engineer | living in Irvine, CA

No responses yet