Skip to content

scikit-hep/cuda-histogram

Repository files navigation

cuda-histogram

Actions Status Documentation Status pre-commit.ci status codecov percentage GitHub Discussion

PyPI platforms PyPI version Conda latest release LICENSE Scikit-HEP

cuda-histogram is a histogram filling package for GPUs. The package tries to follow UHI and keeps its API similar to boost-histogram and hist.

Main features of cuda-histogram:

  • Implements a subset of the features of boost-histogram using CuPy (see API documentation for a complete list):
    • Axes
      • Regular and Variable axes
        • edges()
        • centers()
        • index(...)
        • ...
    • Histogram
      • fill(..., weight=...) (including Nan flow)
      • simple indexing with slicing (see example below)
      • values(flow=...)
      • variance(flow=...)
  • Allows users to detach the generated GPU histogram to CPU -
    • to_boost() - converts to boost-histogram.Histogram
    • to_hist() - converts to hist.Hist

Near future goals for the package -

  • Implement support for Categorical axes (exists internally but need refactoring to match boost-histogram's API)
  • Improve indexing (__getitem__) to exactly match boost-histogram's API

Installation

cuda-histogram is available on PyPI as well as on conda. The library can be installed using pip -

pip install cuda-histogram

or using conda -

conda install -c conda-forge cuda-histogram

Usage

Ideally, a user would want to create a cuda-histogram, fill values on GPU, and convert the filled histogram to boost-histogram/Hist object to access all the UHI functionalities.

Creating a histogram

import cuda_histogram; import cupy as cp

ax1 = cuda_histogram.axis.Regular(10, 0, 1)
ax2 = cuda_histogram.axis.Variable([0, 2, 3, 6])

h = cuda_histogram.Hist(ax1, ax2)

>>> ax1, ax2, h
(Regular(10, 0, 1), Variable([0. 2. 3. 6.]), Hist(Regular(10, 0, 1), Variable([0. 2. 3. 6.])))

Filling a histogram

Differences in API (from boost-histogram) -

  • Has an additional NaN flow
  • Accepts only CuPy arrays
h.fill(cp.random.normal(size=1_000_000), cp.random.normal(size=1_000_000))  # set weight=... for weighted fills

>>> h.values(), type(h.values())  # set flow=True for flow bins (underflow, overflow, nanflow)
(array([[28532.,  1238.,    64.],
       [29603.,  1399.,    61.],
       [30543.,  1341.,    78.],
       [31478.,  1420.,    98.],
       [32692.,  1477.,    92.],
       [32874.,  1441.,    96.],
       [33584.,  1515.,    88.],
       [34304.,  1490.,   114.],
       [34887.,  1598.,   116.],
       [35341.,  1472.,   103.]]), <class 'cupy.ndarray'>)

Indexing axes and histograms

Differences in API (from boost-histogram) -

  • underflow is indexed as 0 and not -1
  • ax[...] will return a cuda_histogram.Interval object
  • No interpolation is performed
  • Hist indices should be in the range of bin edges, instead of integers
>>> ax1.index(0.5)
array([6])

>>> ax1.index(-1)
array([0])

>>> ax1[0]
<Interval ((-inf, 0.0)) instance at 0x1c905208790>

>>> h[0, 0], type(h[0, 0])
(Hist(Regular(1, 0.0, 0.1), Variable([0. 2.])), <class 'cuda_histogram.hist.Hist'>)

>>> h[0, 0].values(), type(h[0, 0].values())
(array([[28532.]]), <class 'cupy.ndarray'>)

>>> h[0, :].values(), type(h[0, 0].values())
(array([[28532.,  1238.,    64.]]), <class 'cupy.ndarray'>)

>>> h[0.2, :].values(), type(h[0, 0].values()) # indices in range of bin edges
(array([[30543.,  1341.,    78.]]), <class 'cupy.ndarray'>)

>>> h[:, 1:2].values(), type(h[0, 0].values()) # no interpolation
C:\Users\Saransh\Saransh_softwares\OpenSource\Python\cuda-histogram\src\cuda_histogram\axis\__init__.py:580: RuntimeWarning: Reducing along axis Variable([0. 2. 3. 6.]): requested start 1 between bin boundaries, no interpolation is performed
  warnings.warn(
(array([[28532.],
       [29603.],
       [30543.],
       [31478.],
       [32692.],
       [32874.],
       [33584.],
       [34304.],
       [34887.],
       [35341.]]), <class 'cupy.ndarray'>)

Converting to CPU

All the existing functionalities of boost-histogram and Hist can be used on the converted histogram.

h.to_boost()

>>> h.to_boost().values(), type(h.to_boost().values())
(array([[28532.,  1238.,    64.],
       [29603.,  1399.,    61.],
       [30543.,  1341.,    78.],
       [31478.,  1420.,    98.],
       [32692.,  1477.,    92.],
       [32874.,  1441.,    96.],
       [33584.,  1515.,    88.],
       [34304.,  1490.,   114.],
       [34887.,  1598.,   116.],
       [35341.,  1472.,   103.]]), <class 'numpy.ndarray'>)

h.to_hist()

>>> h.to_hist().values(), type(h.to_hist().values())
(array([[28532.,  1238.,    64.],
       [29603.,  1399.,    61.],
       [30543.,  1341.,    78.],
       [31478.,  1420.,    98.],
       [32692.,  1477.,    92.],
       [32874.,  1441.,    96.],
       [33584.,  1515.,    88.],
       [34304.,  1490.,   114.],
       [34887.,  1598.,   116.],
       [35341.,  1472.,   103.]]), <class 'numpy.ndarray'>)

Getting help

  • cuda-histogram's code is hosted on GitHub.
  • If something is not working the way it should, or if you want to request a new feature, create a new issue on GitHub.
  • To discuss something related to cuda-histogram, use the discussions tab on GitHub.

Contributing

Contributions of any kind welcome! See CONTRIBUTING.md for information on setting up a development environment.

Acknowledgements

This library was primarily developed by Lindsey Gray, Saransh Chopra, and Jim Pivarski.

Support for this work was provided by the National Science Foundation cooperative agreement OAC-1836650 and PHY-2323298 (IRIS-HEP). Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.