PinnedMy Journey to the UTMB OCC 55k: Fitness, the Course, and Mental PrepAfter over a decade of running marathons around the world and logging countless miles on the road, I’ve decided it’s time to take on a new…Aug 24Aug 24
PinnedNavigating Data: My First Year as a Data EngineerIn 2023, I found myself embarking on a journey into the field of data engineering. Here are three main mindsets to have for growth in 2024.Jan 1Jan 1
PinnedTraining Differences and Similarities Between Marathon and Ultra Trail RunningIn 2024 I tackle new challenges while taking on old ones. UTMB Mont Blanc and Two Marathon Majors.Dec 27, 2023Dec 27, 2023
PinnedSpark Shuffle Partitions: Optimizing Your Data ProcessingShuffling in Apache Spark can be optimized to your requirements. Here is a brief outline and background on how you can do that.Dec 22, 2023Dec 22, 2023
PinnedOptimizing Apache Spark File Compression with LZ4 or SnappyComparing performance between two popular file compression algorithms: LZ4 and SnappyDec 17, 2023Dec 17, 2023
Choosing the Right Dataframe: Pandas, Spark, Polars ComparedKnowing what dataframe to use when working with small to large datasets is very crucial to performance and the output of your data…Jan 15Jan 15
Conquering the UTMB Ultra Trail Mont Blanc: My Journey to the OCC in August 2024IntroductionDec 9, 2023Dec 9, 2023
Streamlining ELT Processes: Optimizing your Delta Tables with Autoloader and SQL in DatabricksEfficient ELT (Extract, Load, Transform) processes are the backbone of data engineering, ensuring that data is reliably and promptly…Nov 25, 20231Nov 25, 20231
Improving Databricks Performance: Leveraging Partitioning, Delta Lake Transaction Logging, and…Here, I provide 3 proven techniques to help speed up your performance in Databricks…Nov 19, 2023Nov 19, 2023
Linear Regression using Apache Spark ML vs Sci-Kit LearnIn my last article I was able to do the best I could in predicting when a two-hour marathon would be broken using machine learning. I am…Nov 12, 2023Nov 12, 2023