Skip to content

ebi-gene-expression-group/atlas-ca-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Atlas analysis for controlled-access datasets

This repo initially is for the analysis of human RNA sequencing data coming from European Genome-phenome Archive (EGA), but it will be extended to other sources.

For GTEX RNA-seq data, see https://github.com/ebi-gene-expression-group/atlas-gtex-bulk.

Prerequisites

  • Snakemake >= 7.25.3
  • SLURM cluster management and job scheduling system
  • Two scripts located at the config private_script:
    • ega_bulk_env.sh
    • ega_bulk_init.sh
  • The irap human config file
    • homo_sapiens.conf

1. Analysis of EGA datasets

1.1 Data preparation

For EGA, download the data and and arrange for analysis as indicated here.

The data and metadata should be in the format:

data
    |- EGAD00001011134
      |- EGAF00008123877
        |- Sample-509_1.fastq.gz
        |- Sample-509_1.fastq.gz.md5
      |- ...
metadata
    |- EGAD00001011134.merged.csv
    |- EGAD00001011134.enaIds.txt

The file .enaIds.txt is provided by curators and contains two columns with the matches between EGA run and ENA run ids.

Then run the Snakefile-ega workflow:

snakemake --restart-times 1 --keep-going \\
  --profile slurm-profile \\
  --latency-wait 150 -p --cores 1 \\
  --config dataset_id=EGADxxxxxxxxxx \\
      input_path=/path-to-data/data \\
      metadata_path=/path-to-metadata/metadata \\
  -s Snakefile-ega

2.1 Data analysis

The workflow Snakefile-irap will validate fastqs, run Irap and prepare the results for aggregation:

snakemake --restart-times 1 --keep-going \\
  --profile slurm-profile --latency-wait 150 -p --use-conda \\
  --conda-frontend conda --conda-base-path /conda-base-path \\
  --conda-prefix /conda-prefix-path/conda \\
  --cores 1 \\
  --config dataset_id=EGADxxxxxxxxxx \\
    metadata_path=/path-to-metadata/metadata \\
    read_type=pe \\
    atlas_ca_root=/path-to-github-repo/atlas-ca-analysis \\
    private_script=/path-private_script/gitlab_scripts \\
    irap_config=/path-to-config/homo_sapiens.conf \\
  -s Snakefile-irap

2.3 Library aggregation

Finally collate irap_single_lib results of individual libraries running

scripts/aggregate_slurm.sh

About

Atlas analysis for controlled-access bulk datasets, such as EGA

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published