-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix post on PyTables+Blosc2 NDim slicing #9
Conversation
And slightly rejuggle the sentence for readability.
They were a result of a bug in the benchmark and differing versions of Blosc2 HDF5 support software.
To hint about them using equivalent versions of Blosc2 HDF5 filter code.
posts/pytables-b2nd-slicing.rst
Outdated
|
||
Conclusions and future work | ||
--------------------------- | ||
|
||
The benchmarks above show how optimized Blosc2 NDim's two-level partitioning combined with direct HDF5 chunk access can yield considerable performance increases when slicing multi-dimensional Blosc2-compressed arrays under PyTables. However, the usual advice holds to invest some effort into fine-tuning some of the parameters used for compression and chunking for better results. We hope that this article also helps readers find those parameters. | ||
|
||
It is worth noting that these techniques still have some limitations: they only work with contiguous slices (that is, with step 1 on every dimension), and on datasets with the same byte ordering as the host machine. Also, although results are good indeed, there may still be room for implementation improvement: for instance, the case of PyTables flat slicing via HDF5 filters (no b2nd) still looks strangely slow in comparison with the equivalent h5py's access; these future enhancements might as well carry over to the b2nd case for even better results. | ||
It is worth noting that these techniques still have some limitations: they only work with contiguous slices (that is, with step 1 on every dimension), and on datasets with the same byte ordering as the host machine. Also, although results are good indeed, there may still be room for implementation improvement, for instance with extra code profiling and parameter adjustments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: "for instance with extra code profiling and parameter adjustments" -> "but that will require extra code profiling and parameter adjustments"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than a small suggestion, LGTM
The benchmark shown in
posts/pytables-b2nd-slicing.rst
had some issues (see PyTables/PyTables:a0124f5) , this fix includes updated benchmark results and related fixes to the text and images.