-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculations of mito directly generate columns with dtypes that break qc calculations on subsequent filtering #116
Comments
The object the fails has the following dtypes:
the AnnData object where the gene metadata gets loaded (with mito) apriori (and doesn't fail) looks like this:
so it seems that the following qc trigger is willing to go with bool but not category (the code is actually setting that column to category at https://github.com/ebi-gene-expression-group/scanpy-scripts/blob/develop/scanpy_scripts/lib/_filter.py#L40). |
And this line then reproduces the error:
Now, the question is why we might be explicitly setting that var column to categorical. At least I can say that moving to using bool there doesn't seem to break the SCXA main workflow downstream. |
The change was introduced at https://github.com/ebi-gene-expression-group/scanpy-scripts/pull/70/files#diff-d4f03c482ed8ddbd6f6e9754d2e42001963362aa3958ee56918f9210747ef2f4R39 to allow negative filtering searches as attempted in #69 . |
Running a first filter step (genes or cells) when there are no mito columns given as part of the cell metadata generates a mito column that is considered logical probably by pandas (instead of possibly categorical when read from the metadata file). This leads into the following error:
Most likely categorical columns (from their pandas dtype) get excluded from that qc_vars list, but not for boolean/logical possibly (or the other way around).
The text was updated successfully, but these errors were encountered: