Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrublet failing on certain dataset #134

Open
anilthanki opened this issue May 31, 2024 · 0 comments
Open

Scrublet failing on certain dataset #134

anilthanki opened this issue May 31, 2024 · 0 comments

Comments

@anilthanki
Copy link
Contributor

Scanpy Scrublet (v1.9.3+galaxy0) is failing on certain dataset with the error shown below. It works successfully if Number of principal components (--n-pcs) are reduced.

Running Scrublet
filtered out 6821 genes that are detected in less than 3 cells
normalizing counts per cell
    finished (0:00:00)
extracting highly variable genes
    finished (0:00:01)
--> added
    'highly_variable', boolean vector (adata.var)
    'means', float vector (adata.var)
    'dispersions', float vector (adata.var)
    'dispersions_norm', float vector (adata.var)
normalizing counts per cell
    finished (0:00:00)
normalizing counts per cell
    finished (0:00:00)
Embedding transcriptomes using PCA...
filtered out 4409 genes that are detected in less than 3 cells
normalizing counts per cell
    finished (0:00:00)
extracting highly variable genes
    finished (0:00:00)
--> added
    'highly_variable', boolean vector (adata.var)
    'means', float vector (adata.var)
    'dispersions', float vector (adata.var)
    'dispersions_norm', float vector (adata.var)
normalizing counts per cell
    finished (0:00:00)
normalizing counts per cell
    finished (0:00:00)
Embedding transcriptomes using PCA...
filtered out 6393 genes that are detected in less than 3 cells
normalizing counts per cell
    finished (0:00:00)
extracting highly variable genes
    finished (0:00:00)
--> added
    'highly_variable', boolean vector (adata.var)
    'means', float vector (adata.var)
    'dispersions', float vector (adata.var)
    'dispersions_norm', float vector (adata.var)
normalizing counts per cell
    finished (0:00:00)
normalizing counts per cell
    finished (0:00:00)
Embedding transcriptomes using PCA...
filtered out 6520 genes that are detected in less than 3 cells
normalizing counts per cell
    finished (0:00:00)
extracting highly variable genes
    finished (0:00:00)
--> added
    'highly_variable', boolean vector (adata.var)
    'means', float vector (adata.var)
    'dispersions', float vector (adata.var)
    'dispersions_norm', float vector (adata.var)
normalizing counts per cell
    finished (0:00:00)
normalizing counts per cell
    finished (0:00:00)
Embedding transcriptomes using PCA...
filtered out 7453 genes that are detected in less than 3 cells
normalizing counts per cell
    finished (0:00:00)
extracting highly variable genes
    finished (0:00:00)
--> added
    'highly_variable', boolean vector (adata.var)
    'means', float vector (adata.var)
    'dispersions', float vector (adata.var)
    'dispersions_norm', float vector (adata.var)
normalizing counts per cell
    finished (0:00:00)
normalizing counts per cell
    finished (0:00:00)
Embedding transcriptomes using PCA...
filtered out 8535 genes that are detected in less than 3 cells
normalizing counts per cell
    finished (0:00:00)
extracting highly variable genes
    finished (0:00:00)
--> added
    'highly_variable', boolean vector (adata.var)
    'means', float vector (adata.var)
    'dispersions', float vector (adata.var)
    'dispersions_norm', float vector (adata.var)
normalizing counts per cell
    finished (0:00:00)
normalizing counts per cell
    finished (0:00:00)
Embedding transcriptomes using PCA...
filtered out 18975 genes that are detected in less than 3 cells
normalizing counts per cell
    finished (0:00:00)
extracting highly variable genes
    finished (0:00:00)
--> added
    'highly_variable', boolean vector (adata.var)
    'means', float vector (adata.var)
    'dispersions', float vector (adata.var)
    'dispersions_norm', float vector (adata.var)
normalizing counts per cell
    finished (0:00:00)
normalizing counts per cell
    finished (0:00:00)
Embedding transcriptomes using PCA...
Traceback (most recent call last):
  File "/usr/local/bin/scanpy-cli", line 10, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/scanpy_scripts/cmd_utils.py", line 49, in cmd
    func(adata, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/scanpy_scripts/lib/_scrublet.py", line 26, in scrublet
    sce.pp.scrublet(adata, adata_sim=adata_sim, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/scanpy/external/pp/_scrublet.py", line 239, in scrublet
    scrubbed = [
  File "/usr/local/lib/python3.9/site-packages/scanpy/external/pp/_scrublet.py", line 240, in <listcomp>
    _run_scrublet(
  File "/usr/local/lib/python3.9/site-packages/scanpy/external/pp/_scrublet.py", line 210, in _run_scrublet
    ad_obs = _scrublet_call_doublets(
  File "/usr/local/lib/python3.9/site-packages/scanpy/external/pp/_scrublet.py", line 439, in _scrublet_call_doublets
    sl.pipeline_pca(
  File "/usr/local/lib/python3.9/site-packages/scrublet/helper_functions.py", line 91, in pipeline_pca
    pca = PCA(n_components=n_prin_comps, random_state=random_state, svd_solver=svd_solver).fit(X_obs)
  File "/usr/local/lib/python3.9/site-packages/sklearn/decomposition/_pca.py", line 435, in fit
    self._fit(X)
  File "/usr/local/lib/python3.9/site-packages/sklearn/decomposition/_pca.py", line 514, in _fit
    return self._fit_truncated(X, n_components, self._fit_svd_solver)
  File "/usr/local/lib/python3.9/site-packages/sklearn/decomposition/_pca.py", line 587, in _fit_truncated
    raise ValueError(
ValueError: n_components=30 must be between 1 and min(n_samples, n_features)=21 with svd_solver='arpack'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant