Optimizing Apache Spark File Compression with LZ4 or Snappy

Matthew Salminen
6 min readDec 17, 2023
Image Source: https://www.vectorlogo.zone/logos/apache_spark/index.html

One challenge you may face when working with Apache Spark is that when you are writing data to a final destination such as S3 or cloud service and the latency and processing time to completion takes longer than anticipated. Often you are working with large datasets or source tables that require long processing times once you are complete with all your table transformations. This is where…

--

--

Matthew Salminen

Marathoner | Trail Runner | Data Engineer | living in Irvine, CA