PinnedMatthew SalminenChoosing the Right Dataframe: Pandas, Spark, Polars ComparedKnowing what dataframe to use when working with small to large datasets is very crucial to performance and the output of your data…·5 min read·Jan 15, 2024----
PinnedMatthew SalminenNavigating Data: My First Year as a Data EngineerIn 2023, I found myself embarking on a journey into the field of data engineering. Here are three main mindsets to have for growth in 2024.·3 min read·Jan 1, 2024----
PinnedMatthew SalminenTraining Differences and Similarities Between Marathon and Ultra Trail RunningIn 2024 I tackle new challenges while taking on old ones. UTMB Mont Blanc and Two Marathon Majors.·4 min read·Dec 27, 2023----
PinnedMatthew SalminenSpark Shuffle Partitions: Optimizing Your Data ProcessingShuffling in Apache Spark can be optimized to your requirements. Here is a brief outline and background on how you can do that.·3 min read·Dec 22, 2023----
PinnedMatthew SalminenOptimizing Apache Spark File Compression with LZ4 or SnappyComparing performance between two popular file compression algorithms: LZ4 and Snappy·6 min read·Dec 17, 2023----
Matthew SalminenConquering the UTMB Ultra Trail Mont Blanc: My Journey to the OCC in August 2024Introduction·4 min read·Dec 9, 2023----
Matthew SalminenStreamlining ELT Processes: Optimizing your Delta Tables with Autoloader and SQL in DatabricksEfficient ELT (Extract, Load, Transform) processes are the backbone of data engineering, ensuring that data is reliably and promptly…·5 min read·Nov 25, 2023--1--1
Matthew SalminenImproving Databricks Performance: Leveraging Partitioning, Delta Lake Transaction Logging, and…Here, I provide 3 proven techniques to help speed up your performance in Databricks…·4 min read·Nov 19, 2023----
Matthew SalminenLinear Regression using Apache Spark ML vs Sci-Kit LearnIn my last article I was able to do the best I could in predicting when a two-hour marathon would be broken using machine learning. I am…·7 min read·Nov 12, 2023----
Matthew SalminenI tried to predict when the 2 hour marathon will be broken using a machine learning model… Part 2·8 min read·Oct 22, 2023----