Fix post on PyTables+Blosc2 NDim slicing #9

ivilata · 2023-11-23T12:04:23Z

The benchmark shown in posts/pytables-b2nd-slicing.rst had some issues (see PyTables/PyTables:a0124f5) , this fix includes updated benchmark results and related fixes to the text and images.

And slightly rejuggle the sentence for readability.

They were a result of a bug in the benchmark and differing versions of Blosc2 HDF5 support software.

To hint about them using equivalent versions of Blosc2 HDF5 filter code.

FrancescAlted · 2023-11-23T12:10:57Z

posts/pytables-b2nd-slicing.rst


 Conclusions and future work
 ---------------------------

 The benchmarks above show how optimized Blosc2 NDim's two-level partitioning combined with direct HDF5 chunk access can yield considerable performance increases when slicing multi-dimensional Blosc2-compressed arrays under PyTables. However, the usual advice holds to invest some effort into fine-tuning some of the parameters used for compression and chunking for better results. We hope that this article also helps readers find those parameters.

-It is worth noting that these techniques still have some limitations: they only work with contiguous slices (that is, with step 1 on every dimension), and on datasets with the same byte ordering as the host machine. Also, although results are good indeed, there may still be room for implementation improvement: for instance, the case of PyTables flat slicing via HDF5 filters (no b2nd) still looks strangely slow in comparison with the equivalent h5py's access; these future enhancements might as well carry over to the b2nd case for even better results.
+It is worth noting that these techniques still have some limitations: they only work with contiguous slices (that is, with step 1 on every dimension), and on datasets with the same byte ordering as the host machine. Also, although results are good indeed, there may still be room for implementation improvement, for instance with extra code profiling and parameter adjustments.


suggestion: "for instance with extra code profiling and parameter adjustments" -> "but that will require extra code profiling and parameter adjustments"

FrancescAlted

Other than a small suggestion, LGTM

@FrancescAlted

Suggested by @FrancescAlted.

ivilata added 6 commits November 23, 2023 12:42

Short note introducing the benchmark update after the intro

d7c9516

Update the graphs with new benchmark results

743999f

Note that PyTables & h5py with HDF5 filters have similar performance

1d0a296

And slightly rejuggle the sentence for readability.

Update maximum speedup observed relative to HDF5 filtering

1a12cfd

Remove mention to oddly high values in h5py performance

cf99346

They were a result of a bug in the benchmark and differing versions of Blosc2 HDF5 support software.

Be specific about PyTables/h5py/hdf5plugin versions in benchmark

369ecde

To hint about them using equivalent versions of Blosc2 HDF5 filter code.

ivilata requested a review from FrancescAlted November 23, 2023 12:04

ivilata self-assigned this Nov 23, 2023

FrancescAlted reviewed Nov 23, 2023

View reviewed changes

FrancescAlted approved these changes Nov 23, 2023

View reviewed changes

Minor fix to pending implementation improvements

77b9425

Suggested by @FrancescAlted.

ivilata merged commit fdfd705 into Blosc:master Nov 23, 2023
1 check passed

ivilata deleted the pytables-b2nd-slicing-update branch November 23, 2023 12:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix post on PyTables+Blosc2 NDim slicing #9

Fix post on PyTables+Blosc2 NDim slicing #9

ivilata commented Nov 23, 2023 •

edited

Loading

FrancescAlted Nov 23, 2023

FrancescAlted left a comment

Fix post on PyTables+Blosc2 NDim slicing #9

Fix post on PyTables+Blosc2 NDim slicing #9

Conversation

ivilata commented Nov 23, 2023 • edited Loading

FrancescAlted Nov 23, 2023

Choose a reason for hiding this comment

FrancescAlted left a comment

Choose a reason for hiding this comment

ivilata commented Nov 23, 2023 •

edited

Loading