Skip to content

Commit

Permalink
Move to scanpy 1.9.3 and other improvements (#117)
Browse files Browse the repository at this point in the history
* Use mito as bool not changing to category

* Trials scanpy 1.9.1 for the mnn - numba issue.

* Fix 1.9.1 hgv not finding base on log1p

* Please black

* Try log fix regardless of option

* Check direct installation

* Set base explicitly to avoid it being dropped

* Please black

* Add igrahp and reinstate leidenalg

* Also louvain is needed

* Pin back h5py

* Passing all tests locally

* Keep todo

* Try github actions with mamba

* Black formatting

* Python versions, better mamba

* Avoid treating versions as numbers

* Actions changes

* Black with no options

* Black fixes

* Check co structure

* Why do we get extra files?

* Black manually

* Make sure env is activated

* pytest fix

* Try with original extra dir for pytest

* Type

* missing extra dir

* Use importlib.metadata instead

* pip install before tests

* impose pin on scipy for mnnpy

* Avoid python 3.10

* Allow single group in one to one marker comp

* Rerun automatic tests

* Try pinning bknn below 1.6.0

* Pin sklearn for bbknn

* Fix package name

* Pin numba for mnnpy

* Downgrade numba even more for mnnpy

* Pin numpy for mmnpy

* Further pin numpy

* Further pin numpy

* More stringent pinning based on 2022 scanpy-scripts latest container

* More pinning for mnnpy

* More mnnpy pinning

* Commented `mnn_batch_correction` test as it fails with scanpy 1.9.1

* adds warning message for `mnn_correct`

* Reformat _mnn.py

* Reformat _mnn.py

---------

Co-authored-by: Iris Diana Yu <[email protected]>
Co-authored-by: Anil Thanki <[email protected]>
  • Loading branch information
3 people authored Feb 9, 2024
1 parent 874d529 commit 2b926f1
Show file tree
Hide file tree
Showing 13 changed files with 151 additions and 68 deletions.
60 changes: 27 additions & 33 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,54 +2,48 @@ name: Python package

on: [pull_request]

defaults:
run:
# for conda env activation
shell: bash -l {0}

jobs:
build:

runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.7, 3.8]
python-version: ["3.8", "3.9"]

steps:

- uses: actions/checkout@v2
with:
path: scanpy-scripts

- uses: psf/black@stable
with:
options: '--check --verbose --include="\.pyi?$" .'

- uses: actions/checkout@v2
with:
repository: theislab/scanpy
path: scanpy
ref: 1.8.1

- name: Setup BATS
uses: mig4/setup-bats@v1
with:
bats-version: 1.2.1

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
- name: Setup mamba
uses: mamba-org/provision-with-micromamba@main
with:
python-version: ${{ matrix.python-version }}

- name: Install dependencies
environment-file: test-env.yaml
cache-downloads: true
channels: conda-forge, bioconda, defaults
extra-specs: |
python=${{ matrix.python-version }}
- name: Run black manually
run: |
pushd scanpy
patch -p1 < ../scanpy-scripts/scrublet.patch
popd
black --check --verbose ./
sudo apt-get install libhdf5-dev
pip install -U setuptools>=40.1 wheel 'cmake<3.20' pytest
pip install $(pwd)/scanpy-scripts
python -m pip install $(pwd)/scanpy --no-deps --ignore-installed -vv
# - name: Install dependencies
# run: |
# sudo apt-get install libhdf5-dev
# pip install -U setuptools>=40.1 wheel 'cmake<3.20' pytest
# pip install $(pwd)/scanpy-scripts
# # python -m pip install $(pwd)/scanpy --no-deps --ignore-installed -vv

- name: Run unit tests
run: pytest --doctest-modules -v ./scanpy-scripts
run: |
# needed for __version__ to be available
pip install . --no-deps --ignore-installed
pytest --doctest-modules -v ./
- name: Test with bats
run: |
./scanpy-scripts/scanpy-scripts-tests.bats
./scanpy-scripts-tests.bats
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,6 @@
*.pyc
/.*history
/.*swp
data
compressed
uncompressed
16 changes: 13 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,22 @@ A command-line interface for functions of the Scanpy suite, to facilitate flexib

## Install

The recommended way of installing scanpy-scripts is via conda:

```bash
conda install scanpy-scripts
# or
pip3 install scanpy-scripts
```

pip installation is also possible, however the version of mnnpy is not patched as in the conda version, and so the `integrate` command will not work.

```bash
pip install scanpy-scripts
```

For development installation, we suggest following the github actions python-package.yml file.

Currently, tests run on python 3.9, so those are the recommended versions if not installing via conda. BKNN doesn't currently install on Python 3.10 due to a skip in Bioconda.

## Test installation

There is an example script included:
Expand All @@ -22,7 +32,7 @@ This requires the [bats](https://github.com/sstephenson/bats) testing framework

## Commands

Available commands are described below. Each has usage instructions available via --help, consult function documentation in scanpy for further details.
Available commands are described below. Each has usage instructions available via `--help`, consult function documentation in scanpy for further details.

```
Usage: scanpy-cli [OPTIONS] COMMAND [ARGS]...
Expand Down
23 changes: 12 additions & 11 deletions scanpy-scripts-tests.bats
Original file line number Diff line number Diff line change
Expand Up @@ -653,17 +653,18 @@ setup() {
}

# Do MNN batch correction, using clustering as batch (just for test purposes)

@test "Run MNN batch integration using clustering as batch" {
if [ "$resume" = 'true' ] && [ -f "$mnn_obj" ]; then
skip "$mnn_obj exists and resume is set to 'true'"
fi

run rm -f $mnn_obj && eval "$scanpy integrate mnn $mnn_opt $louvain_obj $mnn_obj"

[ "$status" -eq 0 ]
[ -f "$mnn_obj" ]
}
# Commented as it fails with scanpy 1.9.1
#
# @test "Run MNN batch integration using clustering as batch" {
# if [ "$resume" = 'true' ] && [ -f "$mnn_obj" ]; then
# skip "$mnn_obj exists and resume is set to 'true'"
# fi
#
# run rm -f $mnn_obj && eval "$scanpy integrate mnn $mnn_opt $louvain_obj $mnn_obj"
#
# [ "$status" -eq 0 ]
# [ -f "$mnn_obj" ]
#}

# Do ComBat batch correction, using clustering as batch (just for test purposes)

Expand Down
4 changes: 2 additions & 2 deletions scanpy_scripts/__init__.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
"""
Provides version, author and exports
"""
import pkg_resources
import importlib.metadata

__version__ = pkg_resources.get_distribution("scanpy-scripts").version
__version__ = importlib.metadata.version("scanpy-scripts")

__author__ = ", ".join(
[
Expand Down
4 changes: 3 additions & 1 deletion scanpy_scripts/cmd_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,11 @@
import pandas as pd
import scanpy as sc
import scanpy.external as sce

from .cmd_options import CMD_OPTIONS
from .lib._paga import plot_paga
from .obj_utils import _save_matrix
from .lib._scrublet import plot_scrublet
from .obj_utils import _save_matrix


def make_subcmd(cmd_name, func, cmd_desc, arg_desc, opt_set=None):
Expand Down Expand Up @@ -313,6 +314,7 @@ def plot_function(
showfig = True
if output_fig:
import os

import matplotlib.pyplot as plt

sc.settings.figdir = os.path.dirname(output_fig) or "."
Expand Down
37 changes: 34 additions & 3 deletions scanpy_scripts/lib/_diffexp.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,11 @@
scanpy diffexp
"""

import logging
import math

import pandas as pd
import scanpy as sc
import logging


def diffexp(
Expand All @@ -22,6 +24,15 @@ def diffexp(
):
"""
Wrapper function for sc.tl.rank_genes_groups.
Test that we can load a single group.
>>> import os
>>> from pathlib import Path
>>> adata = sc.datasets.krumsiek11()
>>> tbl = diffexp(adata, groupby='cell_type', groups='Mo', reference='progenitor')
>>> # get the size of the data frame
>>> tbl.shape
(11, 8)
"""
if adata.raw is None:
use_raw = False
Expand Down Expand Up @@ -51,6 +62,11 @@ def diffexp(
"Singlet groups removed before passing to rank_genes_groups()"
)

# avoid issue when groups is a single group as a string simplified by click
# https://github.com/ebi-gene-expression-group/scanpy-scripts/issues/123
if groups != "all" and isinstance(groups, str):
groups = [groups]

sc.tl.rank_genes_groups(
adata,
use_raw=use_raw,
Expand All @@ -64,17 +80,32 @@ def diffexp(
de_tbl = extract_de_table(adata.uns[diff_key])

if isinstance(filter_params, dict):
key_filtered = diff_key + "_filtered"
sc.tl.filter_rank_genes_groups(
adata,
key=diff_key,
key_added=diff_key + "_filtered",
key_added=key_filtered,
use_raw=use_raw,
**filter_params,
)

de_tbl = extract_de_table(adata.uns[diff_key + "_filtered"])
# there are non strings on recarray object at this point, in
# adata.uns['rank_genes_groups_filtered']['names']
# for instance:
# adata.uns['rank_genes_groups_filtered']['names'][0]
# (nan, nan, 'NKG7', nan, nan, 'PPBP')
# this now upsets h5py > 3.0
de_tbl = extract_de_table(adata.uns[key_filtered])
de_tbl = de_tbl.loc[de_tbl.genes.astype(str) != "nan", :]

# change nan for strings in adata.uns['rank_genes_groups_filtered']['names']
# TODO on scanpy updates, check if this is not done within scanpy so that we can remove this
for row in range(0, len(adata.uns[key_filtered]["names"])):
for col in range(0, len(adata.uns[key_filtered]["names"][row])):
element = adata.uns[key_filtered]["names"][row][col]
if isinstance(element, float) and math.isnan(element):
adata.uns[key_filtered]["names"][row][col] = "nan"

if save:
de_tbl.to_csv(save, sep="\t", header=True, index=False)

Expand Down
2 changes: 1 addition & 1 deletion scanpy_scripts/lib/_filter.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ def filter_anndata(
k_mito = gene_names.str.startswith("MT-")
if k_mito.sum() > 0:
adata.var["mito"] = k_mito
adata.var["mito"] = adata.var["mito"].astype("category")
# adata.var["mito"] = adata.var["mito"].astype("category")
else:
logging.warning(
"No MT genes found, skip calculating "
Expand Down
1 change: 1 addition & 0 deletions scanpy_scripts/lib/_louvain.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
"""

import scanpy as sc

from ..obj_utils import write_obs


Expand Down
9 changes: 7 additions & 2 deletions scanpy_scripts/lib/_mnn.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,10 @@
scanpy external mnn
"""

import scanpy.external as sce
import numpy as np
import click
import numpy as np
import scanpy.external as sce
import logging

# Wrapper for mnn allowing use of non-standard slot

Expand All @@ -16,6 +17,10 @@ def mnn_correct(adata, key=None, key_added=None, var_subset=None, layer=None, **

# mnn will use .X, so we need to put other layers there for processing

logging.warning(
"Use mnn_correct at your own risk, environment installation seems faulty for this module."
)

if layer:
adata.layers["X_backup"] = adata.X
adata.X = adata.layers[layer]
Expand Down
10 changes: 9 additions & 1 deletion scanpy_scripts/lib/_norm.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
"""

import scanpy as sc
import math


def normalize(adata, log_transform=True, **kwargs):
Expand All @@ -12,6 +13,13 @@ def normalize(adata, log_transform=True, **kwargs):
"""
sc.pp.normalize_total(adata, **kwargs)
if log_transform:
sc.pp.log1p(adata)
# Natural logarithm is the default by scanpy, if base is not set
base = math.e
sc.pp.log1p(adata, base=base)
# scanpy is not setting base in uns['log1p'] keys, but later on asking for it
if "log1p" in adata.uns_keys() and "base" not in adata.uns["log1p"]:
# Note that setting base to None doesn't solve the problem at other modules that check for base later on
# as adata.uns["log1p"]["base"] = None gets dropped at either anndata write or read.
adata.uns["log1p"]["base"] = base

return adata
23 changes: 12 additions & 11 deletions setup.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
from setuptools import setup, find_packages
from setuptools import find_packages, setup

with open("README.md", "r") as fh:
long_description = fh.read()

setup(
name="scanpy-scripts",
version="1.1.6",
version="1.1.9",
author="nh3",
author_email="[email protected]",
description="Scripts for using scanpy from the command line",
Expand Down Expand Up @@ -35,23 +35,24 @@
]
),
install_requires=[
"packaging",
"anndata",
"scipy",
"matplotlib",
"pandas",
"h5py<3.0.0",
"scanpy==1.8.1",
# "packaging",
# "anndata",
# "scipy",
# "matplotlib",
# "pandas",
# "h5py<3.0.0",
"scanpy==1.9.3",
"louvain",
"igraph",
"leidenalg",
"loompy",
"Click<8",
"umap-learn",
# "umap-learn",
"harmonypy>=0.0.5",
"bbknn>=1.5.0",
"mnnpy>=0.1.9.5",
"scrublet",
"scikit-misc",
# "scikit-misc",
"fa2",
],
)
Loading

0 comments on commit 2b926f1

Please sign in to comment.