Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuring Parallelism Between Libraries #19

Open
thomasjpfan opened this issue Sep 5, 2024 · 0 comments
Open

Configuring Parallelism Between Libraries #19

thomasjpfan opened this issue Sep 5, 2024 · 0 comments

Comments

@thomasjpfan
Copy link
Member

thomasjpfan commented Sep 5, 2024

Parallelism in Python have two semantics:

  1. "Spawners": Starts up multiple workers that let other code do work. For example:
    1. concurrent.futures.ProcessPoolExecutor(max_workers=4)
    2. cuoncurrent.futures.ThreadPoolExecutor(max_workers=4)
    3. joblib.Parallel(n_jobs=4)
  2. "Computers": Actually uses the compute to do work
    1. A @ B does matrix multiplication with BLAS
    2. scipy.fft.fft(..., workers=8)
    3. list(range(10)) (Pure Python that is single core)

Spawner Configuring the Computer

Scikit-learn uses both semantics and automatically configures parallelism to prevent over subscription. With 24 CPU cores:

halving_search = HalvingRandomSearchCV(
	HistGradientBoostingRegressor(), ..., n_jobs=4,
)
halving_search.fit(...)

HalvingRandomSearchCV spawns 4 workers with multiprocessing and then uses threadpoolctl to configure each worker to use 6 CPU cores for OpenMP.

Failure mode

For a 24 CPU cores, here is an example that over-subscribes and stalls:

def f(x):
    ...
    x @ A  # Uses matmul which uses all cores by default (24)
    ...

# Spawns 24 multi-processing workers to run f
quad_vec(f, ..., workers=4)

The user is responsible to prevent oversubscription:

def f(x):
    with threadpool_limits(limits=6, user_api='blas'):
        x @ A

Underlying Questions

As free-threading Python because real, more users will run library code with multi-threading and ultimately run into this problem. There are two questions:

  1. Should libraries with "spawners" be responsible for setting the number of cores for their workers?
  2. If so, how should this configuration be communicated between libraries?
    1. Currently, Python libraries generally have three ways to configure parallelism:
      1. Environment variable, export OMP_NUM_THREADS=8
      2. Set globally, torch.set_num_threads(8).
      3. Context manager, with threadpool_limits(limits=8)
      4. Functions signature, fft(..., workers=8)
    2. Solution is likely a thread-local config using contextvars that is (somehow) shared between libraries.

Session Notes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants