Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix post on PyTables+Blosc2 NDim slicing #9

Merged
merged 7 commits into from
Nov 23, 2023

Conversation

ivilata
Copy link
Collaborator

@ivilata ivilata commented Nov 23, 2023

The benchmark shown in posts/pytables-b2nd-slicing.rst had some issues (see PyTables/PyTables:a0124f5) , this fix includes updated benchmark results and related fixes to the text and images.

@ivilata ivilata self-assigned this Nov 23, 2023

Conclusions and future work
---------------------------

The benchmarks above show how optimized Blosc2 NDim's two-level partitioning combined with direct HDF5 chunk access can yield considerable performance increases when slicing multi-dimensional Blosc2-compressed arrays under PyTables. However, the usual advice holds to invest some effort into fine-tuning some of the parameters used for compression and chunking for better results. We hope that this article also helps readers find those parameters.

It is worth noting that these techniques still have some limitations: they only work with contiguous slices (that is, with step 1 on every dimension), and on datasets with the same byte ordering as the host machine. Also, although results are good indeed, there may still be room for implementation improvement: for instance, the case of PyTables flat slicing via HDF5 filters (no b2nd) still looks strangely slow in comparison with the equivalent h5py's access; these future enhancements might as well carry over to the b2nd case for even better results.
It is worth noting that these techniques still have some limitations: they only work with contiguous slices (that is, with step 1 on every dimension), and on datasets with the same byte ordering as the host machine. Also, although results are good indeed, there may still be room for implementation improvement, for instance with extra code profiling and parameter adjustments.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: "for instance with extra code profiling and parameter adjustments" -> "but that will require extra code profiling and parameter adjustments"

Copy link
Member

@FrancescAlted FrancescAlted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than a small suggestion, LGTM

@ivilata ivilata merged commit fdfd705 into Blosc:master Nov 23, 2023
1 check passed
@ivilata ivilata deleted the pytables-b2nd-slicing-update branch November 23, 2023 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants