Over the time iceberg tables could slow down and require to run data compaction to clean up tables.
iomete provides built-in job to run data compactions for each table. This job triggers the next iceberg processes:
- ExpireSnapshots Maintenance - Expire Snapshots
- Delete Orphan Files - See Maintenance - Delete Orphan Files
- Rewrite Data Files - See Maintenance - Rewrite Data Files
- Rewrite Manifests - See Maintenance
To enable data compaction spark job follow the next steps:
- In the left sidebar menu choose
- Fill the form with below values:
Schedule (example will run job every Sunday at 12:00, feel free to change the value)
Main application file
Instance: Size (ICU) (feel free to increase)
See example screenshot below
We've created initial job for data-compaction which will be enough in most cases. Feel free to fork and create new data compaction image based on your company requirements.
View in Github
Updated 2 months ago