Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cell_to_library lines don't seem to be written into the MANIFEST #13

Open
pcm32 opened this issue Jul 27, 2022 · 3 comments
Open

cell_to_library lines don't seem to be written into the MANIFEST #13

pcm32 opened this issue Jul 27, 2022 · 3 comments

Comments

@pcm32
Copy link
Member

pcm32 commented Jul 27, 2022

It seems that somehow MANIFESTS are not getting the cell_to_library lines written even though studies do have cell_to_library.txt files within the bundle:

(miniconda3)[host results]$ ls -l */*/bundle/filtered_normalised/cell_to_library.txt
-rw-r--r-- 1 fg_atlas_sc microarray   295035 Dec  3  2021 E-CURD-9/mus_musculus/bundle/filtered_normalised/cell_to_library.txt
-rw-r--r-- 1 fg_atlas_sc microarray  2020908 Dec  3  2021 E-ENAD-49/arabidopsis_thaliana/bundle/filtered_normalised/cell_to_library.txt
-rw-r--r-- 1 fg_atlas_sc microarray   803332 Dec  3  2021 E-ENAD-51/zea_mays/bundle/filtered_normalised/cell_to_library.txt
-rw-r--r-- 1 fg_atlas_sc microarray   225959 Dec  3  2021 E-ENAD-53/solanum_lycopersicum/bundle/filtered_normalised/cell_to_library.txt
-rw-r--r-- 1 fg_atlas_sc microarray   124829 Dec  3  2021 E-GEOD-130148/homo_sapiens/bundle/filtered_normalised/cell_to_library.txt
-rw-r--r-- 1 fg_atlas_sc microarray   135143 May 30 11:27 E-GEOD-137537/homo_sapiens/bundle/filtered_normalised/cell_to_library.txt
-rw-r--r-- 1 fg_atlas_sc microarray  1964387 Aug 19  2021 E-GEOD-141273/drosophila_melanogaster/bundle/filtered_normalised/cell_to_library.txt
-rw-r--r-- 1 fg_atlas_sc microarray   424576 Oct 14  2021 E-GEOD-141730/arabidopsis_thaliana/bundle/filtered_normalised/cell_to_library.txt
-rw-r--r-- 1 fg_atlas_sc microarray  2112572 Dec  3  2021 E-GEOD-150728/homo_sapiens/bundle/filtered_normalised/cell_to_library.txt
-rw-r--r-- 1 fg_atlas_sc microarray  2088049 Aug 18  2021 E-HCAD-10/homo_sapiens/bundle/filtered_normalised/cell_to_library.txt
-rw-r--r-- 1 fg_atlas_sc microarray 22494236 Aug 12  2021 E-HCAD-1/homo_sapiens/bundle/filtered_normalised/cell_to_library.txt
-rw-r--r-- 1 fg_atlas_sc microarray   766738 Dec  3  2021 E-HCAD-30/homo_sapiens/bundle/filtered_normalised/cell_to_library.txt
-rw-r--r-- 1 fg_atlas_sc microarray  2247484 Aug  4  2021 E-HCAD-32/homo_sapiens/bundle/filtered_normalised/cell_to_library.txt
-rw-r--r-- 1 fg_atlas_sc microarray   327955 Dec  6  2021 E-HCAD-9/homo_sapiens/bundle/filtered_normalised/cell_to_library.txt
-rw-r--r-- 1 fg_atlas_sc microarray   182649 Dec  3  2021 E-MTAB-6945/mus_musculus/bundle/filtered_normalised/cell_to_library.txt
-rw-r--r-- 1 fg_atlas_sc microarray   392474 Dec  3  2021 E-MTAB-7142/mus_musculus/bundle/filtered_normalised/cell_to_library.txt
-rw-r--r-- 1 fg_atlas_sc microarray   606703 Aug 19  2021 E-MTAB-8698/drosophila_melanogaster/bundle/filtered_normalised/cell_to_library.txt
-rw-r--r-- 1 fg_atlas_sc microarray   224737 Jul 22 15:35 E-MTAB-8848/mus_musculus/bundle/filtered_normalised/cell_to_library.txt
(miniconda3)[host results]$ grep cell_to */*/bundle/MANIFEST
(miniconda3)[host results]$ grep cell_to E-MTAB-7142/mus_musculus/bundle/MANIFEST
(miniconda3)[host results]$ grep cell_to E-MTAB-8848/mus_musculus/bundle/MANIFEST
(miniconda3)[host results]$

this is breaking the loading as we get errors of the type:

Cell types file is present at */atlas-prod/sc_experiments_test/E-GEOD-141273/E-GEOD-141273.cells.txt, but no cell/ library maping is available at */atlas-prod/sc_experiments_test/E-GEOD-141273/cell_to_library.txt - this file is required to map cell metadata to libraries

I don't see any lines in this workflow that implies that that is being written. Note that the E-MTAB-8848 has been quite recently generated and it doesn't include either the line in the manifest.

@pcm32
Copy link
Member Author

pcm32 commented Jul 28, 2022

There is more bundle interacting code at https://github.com/ebi-gene-expression-group/scxa-control-workflow/blob/develop/main.nf#L1237 . And while there is also a lot of mention of CELL_TO_LIBRARY there , I don't see there either where the lines for cell_to_library in the manifest would go.

@pcm32
Copy link
Member Author

pcm32 commented Jul 28, 2022

I think that the issue is that in atlas-prod develop it simply interacts directly with the file if it exists,

https://github.com/ebi-gene-expression-group/atlas-prod/blob/develop/exec/import_scxa_experiment.sh#L200

but in the anndata-tweak branch, it is asking for the cell_to_library entry in the manifests (and failing):

https://github.com/ebi-gene-expression-group/atlas-prod/blob/feature/anndata_import_tweaks/exec/import_scxa_experiment.sh#L96

as that has never lived in the MANIFESTs. So it doesn't get copied and then condensed SDRF for single cell doesn't find the file and fails at https://github.com/ebi-gene-expression-group/experiment_metadata/blob/1afc9fd63cb224f09ce74d48423b8fcdc0f1cb06/single_cell_condensed_sdrf.sh#L100 .

@pcm32
Copy link
Member Author

pcm32 commented Jul 29, 2022

I have partly alleviated this through https://github.com/ebi-gene-expression-group/atlas-prod/pull/246/commits/7edc3126c8b60a0c0bb0ef3cc0388e1497969a31 but we should add a PR to this repo that makes sure that the MANIFEST file gets as well the cell_to_library line. Could you please take care of that @irisdianauy ? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant