When z-order optimizing, keep partition in only one row_group (if possible) #2769

deanm0000 · 2024-08-13T19:18:53Z

Description

I'd like to see z-order optimization be extended to the row_group by having a row_group end if it can't fit the next partition in it rather than splitting that partition. As it is now, if I query the table for that partition value, the reader has to read 2 row_groups instead of just 1.

I don't have an MRE for the data setup but here's a demo with my own data behind it.

from deltalake import DeltaTable, WriterProperties
import pyarrow.parquet as pq
import fsspec
dt_path=...
abfs= fsspec(...)
dt=DeltaTable(dt_path)
dt.optimize.z_order(
    ['node_id'], 
    writer_properties=WriterProperties(compression="ZSTD")
    )
dtfile=dt.files()[0]
with abfs.open(f"{dt_path}/{dtfile}", "rb") as ff:
    pqfile=pq.ParquetFile(ff)
stats=[]
for rg in range(pqfile.metadata.num_row_groups):
    stats.append({'rg':rg, 
                    'min_node':pqfile.metadata.row_group(rg).column(3).statistics.min,
                    'max_node':pqfile.metadata.row_group(rg).column(3).statistics.max,
                    'num_values':pqfile.metadata.row_group(rg).column(3).statistics.num_values
                    })
stats[0:5] # Just first 5 for brevity
[{'rg': 0, 'min_node': 1, 'max_node': 49202, 'num_values': 1048576},
 {'rg': 1, 'min_node': 49202, 'max_node': 49636, 'num_values': 1048576},
 {'rg': 2, 'min_node': 49636, 'max_node': 50496, 'num_values': 1048576},
 {'rg': 3, 'min_node': 50496, 'max_node': 52458, 'num_values': 1048576},
 {'rg': 4, 'min_node': 52458, 'max_node': 1048072, 'num_values': 1048576}]

Notice how the max_node in each row group is the min_node for the next row_group which means values of that node span two row_groups so if I query that node then it has to download 2 row groups instead of just 1.

It'd be better if the first row group stopped at 49201 (or whatever came before 49202) and then 49202 was solely in the second rg.

Use Case
Faster, more efficient queries of nodes that would otherwise be straddling 2 row_groups.

Related Issue(s)
unknown, maybe page index?

The text was updated successfully, but these errors were encountered:

deanm0000 · 2024-09-17T17:07:31Z

I made this which does the above with pyarrow although just one partition at a time. It's on pypi

deanm0000 added the enhancement New feature or request label Aug 13, 2024

rtyler added the binding/python Issues for the Python package label Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When z-order optimizing, keep partition in only one row_group (if possible) #2769

When z-order optimizing, keep partition in only one row_group (if possible) #2769

deanm0000 commented Aug 13, 2024

deanm0000 commented Sep 17, 2024

When z-order optimizing, keep partition in only one row_group (if possible) #2769

When z-order optimizing, keep partition in only one row_group (if possible) #2769

Comments

deanm0000 commented Aug 13, 2024

Description

deanm0000 commented Sep 17, 2024