Apache Spark - The Pragmatic Data Engineer's Playbook

Jan

13

Optimizing Iceberg MERGE Statements

How can you eliminate shuffling, sorting, and push-down filters to optimize the Apache Iceberg merge statements?

Jan 13, 2025

11 min read

Dec

08

Reasons and Solutions to Avoid Performance Degradation due to excessive use of `.withColumn()` in Apache Spark

Dec 8, 2024

8 min read

Nov

28

A Deep Dive into Shuffle-less joins (Storage Partitioned Joins) in Apache Spark to improve Join performance when using V2 Data Sources.

Nov 28, 2024

10 min read