Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/scanpy_mtx_compression #259

Open
wants to merge 20 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions tools/tertiary-analysis/scanpy/scanpy-filter-cells.xml
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,35 @@ PYTHONIOENCODING=utf-8 scanpy-filter-cells
</repeat>
<output name="output_h5" file="filter_cells.h5" ftype="h5" compare="sim_size"/>
</test>
<test>
<param name="input_obj_file" value="read_10x.h5"/>
<param name="input_format" value="anndata"/>
<param name="output_format" value="anndata"/>
<param name="export_mtx" value="true"/>
<param name="mtx_compression" value="gzip"/>
<repeat name="parameters">
<param name="name" value="n_genes"/>
<param name="min" value="200"/>
<param name="max" value="20000"/>
</repeat>
<repeat name="parameters">
<param name="name" value="n_counts"/>
<param name="min" value="0"/>
<param name="max" value="1e9"/>
</repeat>
<output name="output_h5" file="filter_cells.h5" ftype="h5" compare="sim_size"/>
<output_collection name="mtx_gzip" type="list" count="3">
<element name="matrix_10x_gzip" ftype="gz">
<assert_contents><has_size value="3300000" delta="300000"/></assert_contents>
</element>
<element name="genes_10x_gzip" ftype="gz">
<assert_contents><has_size value="126000" delta="13000"/></assert_contents>
</element>
<element name="barcodes_10x_gzip" ftype="gz">
<assert_contents><has_size value="207" delta="21"/></assert_contents>
</element>
</output_collection>
</test>
</tests>

<help><![CDATA[
Expand Down
1 change: 1 addition & 0 deletions tools/tertiary-analysis/scanpy/scanpy-normalise-data.xml
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ PYTHONIOENCODING=utf-8 scanpy-normalise-data
<param name="max_fraction" argument="--max-fraction" type="float" value="0.05" min="0" max="1"
label="Consider cells as highly expressed that have more counts than max_fraction of the original total counts in at least one cell." />
</when>
<when value="false" />
</conditional>
<param name="layers" argument="--layers" type="text" optional="true"
label="Comma-separated list of layers to normalize. Set to 'all' to normalize all layers."/>
Expand Down
72 changes: 59 additions & 13 deletions tools/tertiary-analysis/scanpy/scanpy_macros2.xml
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
<macros>
<token name="@TOOL_VERSION@">1.8.1+3</token>
<token name="@TOOL_VERSION@">1.8.1+4</token>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to move out of this pattern as it confuses the tools sorting. Please remove the + here and simply bump the Galaxy build number for the tools.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #266

<token name="@HELP@">More information can be found at https://scanpy.readthedocs.io</token>
<token name="@PROFILE@">18.01</token>
<token name="@VERSION_HISTORY@"><![CDATA[
**Version history**

1.8.1+4+galaxy0: Upate to scanpy-scripts 1.1.5 (running scanpy ==1.8.1), including an option to compress MTX outputs.

1.8.1+3+galaxy0: Upate to scanpy-scripts 1.1.3 (running scanpy ==1.8.1), including a fix to MTX output and a bugfix for the Scrublet wrapper.

1.8.1+2+galaxy0: Upate to scanpy-scripts 1.1.2 (running scanpy ==1.8.1), including improved boolean handling for mito etc.
Comment on lines +8 to 12
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1.8.1+4+galaxy0: Upate to scanpy-scripts 1.1.5 (running scanpy ==1.8.1), including an option to compress MTX outputs.
1.8.1+3+galaxy0: Upate to scanpy-scripts 1.1.3 (running scanpy ==1.8.1), including a fix to MTX output and a bugfix for the Scrublet wrapper.
1.8.1+2+galaxy0: Upate to scanpy-scripts 1.1.2 (running scanpy ==1.8.1), including improved boolean handling for mito etc.
1.8.5+galaxy0: Update to scanpy-scripts 1.1.5 (running scanpy ==1.8.1), including an option to compress MTX outputs.
1.8.1+3+galaxy0: Update to scanpy-scripts 1.1.3 (running scanpy ==1.8.1), including a fix to MTX output and a bugfix for the Scrublet wrapper.
1.8.1+2+galaxy0: Update to scanpy-scripts 1.1.2 (running scanpy ==1.8.1), including improved boolean handling for mito etc.

or perhaps just change the minor version.

Expand Down Expand Up @@ -63,11 +65,16 @@ EMBL-EBI https://www.ebi.ac.uk/ and Teichmann Lab at Wellcome Sanger Institute.
${fig_frame}
./output.png
</token>
<token name="@EXPORT_MTX_OPTS@">${export_mtx}</token>
<token name="@EXPORT_MTX_OPTS@">
#if $export_mtx_inputs.export_mtx:
--export-mtx '${export_mtx_inputs.export_mtx}'
--mtx-compression '${export_mtx_inputs.mtx_compression}'
#end if
</token>

<xml name="requirements">
<requirements>
<requirement type="package" version="1.1.3">scanpy-scripts</requirement>
<requirement type="package" version="1.1.5">scanpy-scripts</requirement>
<yield/>
</requirements>
</xml>
Expand Down Expand Up @@ -155,18 +162,57 @@ EMBL-EBI https://www.ebi.ac.uk/ and Teichmann Lab at Wellcome Sanger Institute.
</xml>

<xml name="export_mtx_params">
<param name="export_mtx" argument="--export-mtx" type="boolean" truevalue="--export-mtx ./" falsevalue="" checked="false" label="Save to 10x mtx format" help="If enabled, it will generate in addition to the main output in Loom or AnnData an export in 10x format."/>
<conditional name="export_mtx_inputs">
<param name="export_mtx" argument="--export-mtx" type="boolean" truevalue="./" falsevalue="" checked="false" label="Save to 10x mtx format" help="If enabled, it will generate in addition to the main output in Loom or AnnData an export in 10x format."/>
<when value="./">
<param name="mtx_compression" argument="--mtx-compression" type="select" label="" help="Compression type for MTX output.">
<option value="" selected="true">No compression</option>
<option value="zip">zip</option>
<option value="gzip">gzip</option>
<option value="bz2">bz2</option>
<option value="zstd">zstd</option>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if zst is not supported in Galaxy then we should remove it for now.

</param>
</when>
<when value=""/>
</conditional>
</xml>

<xml name="export_mtx_outputs">
<data name="matrix_10x" format="txt" from_work_dir="matrix.mtx" label="${tool.name} on ${on_string}: 10x matrix">
<filter>export_mtx</filter>
</data>
<data name="genes_10x" format="tsv" from_work_dir="genes.tsv" label="${tool.name} on ${on_string}: 10x genes">
<filter>export_mtx</filter>
</data>
<data name="barcodes_10x" format="tsv" from_work_dir="barcodes.tsv" label="${tool.name} on ${on_string}: 10x barcodes">
<filter>export_mtx</filter>
</data>
<collection name="mtx_raw" type="list">
<filter>export_mtx_inputs[export_mtx] == True and
export_mtx_inputs[mtx_compression] == ""</filter>
<data name="matrix_10x" label="${tool.name} on ${on_string}: 10x matrix" format="txt" from_work_dir="matrix.mtx"/>
<data name="genes_10x" label="${tool.name} on ${on_string}: 10x genes" format="tsv" from_work_dir="genes.tsv"/>
<data name="barcodes_10x" label="${tool.name} on ${on_string}: 10x barcodes" format="tsv" from_work_dir="barcodes.tsv"/>
</collection>
<collection name="mtx_zip" type="list" format="zip">
<filter>export_mtx_inputs[export_mtx] == True and
export_mtx_inputs[mtx_compression] == "zip"</filter>
<data name="matrix_10x_zip" label="${tool.name} on ${on_string}: 10x matrix (zip)" from_work_dir="matrix.mtx.zip"/>
<data name="genes_10x_zip" label="${tool.name} on ${on_string}: 10x genes (zip)" from_work_dir="genes.tsv.zip"/>
<data name="barcodes_10x_zip" label="${tool.name} on ${on_string}: 10x barcodes (zip)" from_work_dir="barcodes.tsv.zip"/>
</collection>
<collection name="mtx_gzip" type="list" format="gz">
<filter>export_mtx_inputs[export_mtx] == True and
export_mtx_inputs[mtx_compression] == "gzip"</filter>
<data name="matrix_10x_gzip" label="${tool.name} on ${on_string}: 10x matrix (gzip)" from_work_dir="matrix.mtx.gz"/>
<data name="genes_10x_gzip" label="${tool.name} on ${on_string}: 10x genes (gzip)" from_work_dir="genes.tsv.gz"/>
<data name="barcodes_10x_gzip" label="${tool.name} on ${on_string}: 10x barcodes (gzip)" from_work_dir="barcodes.tsv.gz"/>
</collection>
<collection name="mtx_bz2" type="list" format="bz2">
<filter>export_mtx_inputs[export_mtx] == True and
export_mtx_inputs[mtx_compression] == "bz2"</filter>
<data name="matrix_10x_bz2" label="${tool.name} on ${on_string}: 10x matrix (bz2)" from_work_dir="matrix.mtx.bz2"/>
<data name="genes_10x_bz2" label="${tool.name} on ${on_string}: 10x genes (bz2)" from_work_dir="genes.tsv.bz2"/>
<data name="barcodes_10x_bz2" label="${tool.name} on ${on_string}: 10x barcodes (bz2)" from_work_dir="barcodes.tsv.bz2"/>
</collection>
<collection name="mtx_zstd" type="list" format="zst">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot find zst support in Galaxy, I would suggest that we comment this part. If we really need it, we could add it as a format to the galaxy codebase (it is called a datatype in Galaxy-speak).

<filter>export_mtx_inputs[export_mtx] == True and
export_mtx_inputs[mtx_compression] == "zstd"</filter>
<data name="matrix_10x_zstd" label="${tool.name} on ${on_string}: 10x matrix (zstd)" from_work_dir="matrix.mtx.zst"/>
<data name="genes_10x_zstd" label="${tool.name} on ${on_string}: 10x genes (zstd)" from_work_dir="genes.tsv.zst"/>
<data name="barcodes_10x_zstd" label="${tool.name} on ${on_string}: 10x barcodes (zstd)" from_work_dir="barcodes.tsv.zst"/>
</collection>
</xml>

</macros>