PinnedMatthew SalminenMy Journey to the UTMB OCC 55k: Fitness, the Course, and Mental PrepAfter over a decade of running marathons around the world and logging countless miles on the road, I’ve decided it’s time to take on a new…Aug 24Aug 24
PinnedMatthew SalminenNavigating Data: My First Year as a Data EngineerIn 2023, I found myself embarking on a journey into the field of data engineering. Here are three main mindsets to have for growth in 2024.Jan 1Jan 1
PinnedMatthew SalminenTraining Differences and Similarities Between Marathon and Ultra Trail RunningIn 2024 I tackle new challenges while taking on old ones. UTMB Mont Blanc and Two Marathon Majors.Dec 27, 2023Dec 27, 2023
PinnedMatthew SalminenSpark Shuffle Partitions: Optimizing Your Data ProcessingShuffling in Apache Spark can be optimized to your requirements. Here is a brief outline and background on how you can do that.Dec 22, 2023Dec 22, 2023
PinnedMatthew SalminenOptimizing Apache Spark File Compression with LZ4 or SnappyComparing performance between two popular file compression algorithms: LZ4 and SnappyDec 17, 2023Dec 17, 2023
Matthew SalminenChoosing the Right Dataframe: Pandas, Spark, Polars ComparedKnowing what dataframe to use when working with small to large datasets is very crucial to performance and the output of your data…Jan 15Jan 15
Matthew SalminenConquering the UTMB Ultra Trail Mont Blanc: My Journey to the OCC in August 2024IntroductionDec 9, 2023Dec 9, 2023
Matthew SalminenStreamlining ELT Processes: Optimizing your Delta Tables with Autoloader and SQL in DatabricksEfficient ELT (Extract, Load, Transform) processes are the backbone of data engineering, ensuring that data is reliably and promptly…Nov 25, 20231Nov 25, 20231
Matthew SalminenImproving Databricks Performance: Leveraging Partitioning, Delta Lake Transaction Logging, and…Here, I provide 3 proven techniques to help speed up your performance in Databricks…Nov 19, 2023Nov 19, 2023
Matthew SalminenLinear Regression using Apache Spark ML vs Sci-Kit LearnIn my last article I was able to do the best I could in predicting when a two-hour marathon would be broken using machine learning. I am…Nov 12, 2023Nov 12, 2023