Copy-on-Write or Merge-on-Read? What, When, and How?
Copy-on-Write or Merge-on-Read? Optimizing Row-level updates in Apache Iceberg Table by understanding both the approaches and deciding when to use which approach and its impact on the Read and Write speed of the table.
How to identify these using Iceberg Metadata tables on AWS?
Optimizing reads and writes from Apache Iceberg Tables is a crucial skill that a data engineer needs to know.
To improve Apache Iceberg's read and write performance, one important aspect is how row-level updates are handled. COW (Copy-on-Write) and MOR (Merge-on-Read) are the two approaches that can be configured in an Iceberg table to deal with row-level updates. Knowing how these approaches work internally equips you with the ability to define these in the initial phases of your Iceberg table designs that will keep your tables performant in the long run.
By the end of this blog post, you will be fully equipped with all the knowledge and details that will help you decide, which approach to use for your use case in real-world projects.
This blog post will cover (of course with code samples using PySpark):
What are COW and MOR? How to configure these?
What are Delete Files in the data layer and what do they contain?
When to use which approach?
How do these help in optimizing reads/writes from the Iceberg Table?
Let's understand the approaches now before we dive into code.
Copy-on-Write
In this approach, if even a single row in a data file is updated or deleted, the associated data file is rewritten with the updated or deleted records.
The new snapshot of the table created because of these operations will point to this newer version of the data file. This is the default approach.
Before we dive into MOR, it's important to understand the Delete Files and what information these files have.
Delete Files
Delete files track which records in the dataset have been logically deleted and need to be ignored when a query engine tries to read the data from an Iceberg Table.
Delete files are created within each partition depending on the data file from where the record is logically deleted or updated.
There are 2 types of delete files based on how these delete files store delete records information.
Positional Delete Files
Positional Delete files store the exact position of the deleted records in the dataset. It keeps track of the file path of the data file along with the position of the deleted records in that file.
Equality Delete Files
Equality Delete Files stores the value of one or more columns of the deleted records. These column values are stored based on the condition used while deleting these records.
Delete Files are only supported by Iceberg v2 tables.
When I started understanding Delete files, the most curious question that I got was "How can I choose or configure that my table should use Positional or Euality Delete file while handling row-level updates?"
Turned out, there is no way as of now that lets you choose as a user for a specific Delete File type. It depends on how the DeleteFile Interface is implemented in the engine that is being used.
Alrighty, finally as now we know about delete files, let's take a look at the next approach and then off to the understanding it in code.
Merge-on-Read
In this approach, update or delete operations on the Iceberg Table, the existing data files are not rewritten. Instead, a delete file is generated that keeps track of which records need to be ignored.
In case of deleting records, the record entries are listed in a Delete File.
In case of updating records:
The records to be updated are listed in a delete file.
A new data file is created that contains only the updated records.
Configuring COW and MOR
Configuring COW or MOR for particular transactions is as simple as mentioning these in the table properties.
Keep in mind that these properties are the specifications and whether these work as expected or not depends on whether a query engine that is being used honors these or not. If not, you might end up in unexpected results.
Properties for configuring
write.delete.mode : Approach to use for delete transactions
write.update.mode: Approach to use for update transactions
write.merge.mode: Approach to use for merge transactions
These table properties can be defined during the creation of a table or can be added to a table using ALTER statement.
When to use which approach?
Before we get into when to use which approach, it's important to understand the pros and cons of using both COW and MOR and how these impact the read and write speed of the table.
Here's an idea to develop a quick intuition about read and write performance:
It's faster to read from a table if the query engine doesn't have to read lots of file and doesn't have to reconcile or do any additional work while reading the data. It's faster to write into a table if query engine has to write less data.
Now based on this, let's see:
COW: In case of row-level deletes/updates, it rewrites the entire file even if there is a single record is impacted. More data needs to be written that causes slower row-level Updates/Deletes. Data is read without reconciling. It writes any deleted or updated files, resulting in faster reads.
MOR: In case of row-level deletes/updates, it avoids rewriting the entire data file. It writes only the Delete File along with the updated data file in case of Updates i.e. basically writing less data and hence faster writes. Data is read along with reconciling any deleted or updated files, resulting in slower reads.
Here's the summarization of everything we have discussed in this
As for the performance of both of these approaches, here are some numbers provided by AWS on a blog, where they have run different row-level updates on 47.4GB of data multiple times.
That's it for all the concepts that you need to know.
The next section is about how you see these updates/deletes in the Iceberg metadata table in the case of both COW and MOR approaches. We will also see how to check how many delete or data files have been added/removed after both the DELETE and UPDATE operation along with how the data is stored in Delete Files in case of MOR.
COW/MOR Code Example
If you have decided to read this section, I would strongly recommend that you be familiar with Apache Iceberg Table Architecture.
If you are not familiar with or even need a quick refresher on Iceberg table architecture. You can give it a quick read here.
Setting up the environment
I will be using a Jupyter Notebook on AWS EMR 6.15 here, which uses AWS Glue Catalog as metadata store for both Hive Table and Spark Table metadata.
If you are interested in understanding the configuration for Iceberg, you can check this blog post.
💡
All the code in this section can be found on my GitHub here. If you want to follow along in your local workspace, you can find a notebook also on GitHub here.
Creating 2 Iceberg Tables with the same data
I am using the NYC Yellow Taxi Trips data for Sep and Oct 2023. This can be found here.
Setting up COW and MOR table properties for tables
Delete Operation on Iceberg Table
Let's perform the delete operations and look into the metadata tables for cow_table and mor_table
As per what we have learned about COW and MOR, COW rewrites the data file and MOR writes a Delete File that records the data file path and the exact position of the deleted record in the table. Let's look into the Iceberg metadata tables to check these.
Analyzing Metadata tables after the DELETE operation
If you are new to Iceberg Metadata Tables, here is a short detail of the tables that we will be using in this section.
Every write operation (insert/delete/update/merge) on an Iceberg Table creates a new snapshot for the table.
The snapshot lineage and what operation has added a particular snapshot can be seen in snapshots metadata table.
All the details of the file added and deleted within a particular snapshot can be seen in manifests table.
All the currently used data files for a table can be seen in files table.
COW Table Analysis
The interesting fields to look in the above output are contentadded_data_files_count , deleted_data_files_count , and added_delete_files_count .
This shows that the data files are rewritten when using the COW approach.
MOR Table Analysis
Now if you look at the content column in the output, there are 2 possible values 0 and 1 .
In the manifest table, content = 0 represents the manifest files that track data files, and content = 1 represents the manifest files that track the delete files.
If you look at the values of added_data_files_count and added_delete_files_count it's 0 and 1 respectively.
This shows the latest snapshot that was created after delete operation has just added Delete Files and no data files are rewritten.
Another interesting thing to look for MOR is the data present in files table.
In the files table, content = 0 represents a Data File, content = 1 represents a Positional Delete File, content = 2 represents an Equality Delete File.
As the content column in the table output has 1, this shows that Spark has created a Positional Delete Files. Also if you look into the file_path it shows that both the delete files created are in different partitions.
Let's take a look into one of these Positional Delete Files content.
As we saw previously in the Delete File section, Positional Delete files store the data file path and position of the deleted record in the corresponding data file. It can be seen in the output above.
Update Operation On Iceberg Table
As in case of the COW approach, it just rewrites the data file in both the Delete/Update Operation, nothing interesting to look into.
We will perform an update operation on the table with MOR properties enabled.
Analyzing Metadata tables after the UPDATE operation
If you look at added_data_files_count and added_delete_files_count , there are 2 data and delete files have been added. The Delete Files in this case include the file path and position of the updated records and the data files include the records with updated value.
To easily identify these files in files table, let's see first how many records have been impacted.
Let's look in files table.
Row #1 and Row#2 in the above output show the data files added, and Row #10 and #11 show the new delete files added. Now if we look at the record_count and partition column across these rows, we can see the number of records in data and delete files are the same across each partition.
That's it for this one folks..!!! 😊 Now you know how to identify changes in case of COW and MOR approaches in the tables via metadata table also.
See you in the next one..! 🚀
If it has added any value to you and want to read more content like this, subscribe to the newsletter, it's free of cost and I will make sure every post is worth your time and you get to learn something new.
Selecting between Double and Decimal Data Type To Avoid Unexpected Results
How to choose between Double and Decimal data types for your tables/datasets, why does it matter, and when to choose which one?
4 min read
Nov
28
Shuffle-less Join, a.k.a Storage Partition Join in Apache Spark - Why, How and Where?
A Deep Dive into Shuffle-less joins (Storage Partitioned Joins) in Apache Spark to improve Join performance when using V2 Data Sources.
10 min read
Oct
12
Enhancing Spark Job Performance with Multithreading
It covers a Spark Job Optimization technique to enhance the performance of independent running queries using Multithreading in Pyspark.
7 min read
Feb
13
EMRFS S3 Optimized Committer and Committer Protocol for Improving Spark Write Performance - Why and How?
What are EMRFS S3 Optimized Committer and EMRFS S3 Optimized Committer Protocol and how to use and identify if these are working for your Spark Jobs to improve write performance?
30 min read
Jan
17
Apache Iceberg - Architecture Demystified
A detailed explanation of Apache Iceberg Architecture and how it evolves when data is inserted into the table.
Member discussion