Member-only story
The Power Duo: Databricks Auto Loader and Delta Live Tables

In my last two posts, I explained the benefits of using autoloader for your data pipelines with Databricks. A lot can be said that this is a very efficient way to handle incoming data batches for your ingestion process. The power of your data pipeline doesn’t stop there. Even if autoloader simplifies the process of ingesting large volumes of data into Databricks Delta Lake, we have the ability to take batch processing and allow real-time data streams instead. This is where Databricks Delta Live Tables comes in.
What are Delta Live Tables or DLTs?
Delta Live Tables or DLTs for short allows you to create data pipelines that are managed real-time. This will make ETL development easier to execute and maintain when the goal of your ETL involves data coming in with high volume and velocity. It is important to understand the difference between Delta Tables and Delta “Live” Tables before you proceed.
Delta Tables are a way to store your data as tables in Databricks while Delta Live Tables are a way to introduce how data flows within these tables. Whereas Delta Tables provide a table architecture, DLTs are a data pipeline framework that you can use when building your ETL pipeline.
How can we incorporate Delta Live Tables into our data pipelines?
Creating Delta Live Tables involves working with the medallion or multi-hop architecture that I explained in my previous post, Databricks Autoloader and Medallion Architecture… Pt 2. Below is an example incorporating the bronze, silver, and gold layers of this architecture within your Databricks notebooks. SQL will be used in this example although the same can be accomplished with pyspark/python.
- Create Bronze Layer Tables: Here we are creating a live table called bronze_table and defining the schema/data source. You can also add comments for reference as you create each live table.
-- Bronze Layer Table
CREATE OR REFRESH STREAMING LIVE TABLE bronze_table
COMMENT "This is a sample table"
AS SELECT * FROM cloud_files("${file_path}/table_name_1", "delta"…