Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complexity curve #96

Open
bazyliszek opened this issue May 21, 2019 · 5 comments
Open

Complexity curve #96

bazyliszek opened this issue May 21, 2019 · 5 comments
Labels
question Further information is requested

Comments

@bazyliszek
Copy link

bazyliszek commented May 21, 2019

  1. Looking into complexity curve. Could it be that for PE reads we actually need to add the -P parameter in preseq? This was not detected in the pipeline automatically.
samtools sort 6-10A_S13_L001_R1_001_val_1_bismark_bt2_pe.bam \
         -m 8589934592 \
         -@ 1 \
         -o 6-10A_S13_L001_R1_001_val_1_bismark_bt2_pe.sorted.bam
     preseq lc_extrap -v -B 6-10A_S13_L001_R1_001_val_1_bismark_bt2_pe.sorted.bam -o 6-10A_S13_L001_R1_001_val_1_bismark_bt2_pe.ccurve.txt 
  1. Also, for very small amounts of the reads this program is not working, but I guess that is my problem with reads of total number 119,573. Error: max count before zero is less than min required cound (4), sample not sufficiently deep or duplicates removed.

original command:

BAM_INPUT
TOTAL READS     = 119573
DISTINCT READS  = 119456
DISTINCT COUNTS = 3
MAX COUNT       = 3
COUNTS OF 1     = 119342
MAX TERMS       = 2
OBSERVED COUNTS (4)
1       119342
2       111
3       3

ERROR:  max count before zero is les than min required count (4), sample not sufficiently deep or duplicates removed

and if I implement PE:

PAIRED_END_BAM_INPUT
paired = 119572
unpaired = 0
MERGED PAIRED END READS = 119572
MATES PROCESSED = 239144
TOTAL READS     = 119572
DISTINCT READS  = 119485
DISTINCT COUNTS = 2
MAX COUNT       = 2
COUNTS OF 1     = 119398
MAX TERMS       = 2
OBSERVED COUNTS (3)
1       119398
2       87

ERROR:  max count before zero is les than min required count (4), sample not sufficiently deep or duplicates removed
@ewels ewels added the question Further information is requested label May 22, 2019
@ewels
Copy link
Member

ewels commented May 22, 2019

Ooh, yes - this goes a long way back. Basically, running with -P makes preseq fail a lot. This became very frustrating so I just removed it and made it run in single-end mode all of the time. The shapes of the curves are still correct, but I agree that it's not very clear and should be improved.

The problem with it failing with low read counts is just down to preseq. Nothing that we can do about that from the pipeline I think sorry.

@bazyliszek
Copy link
Author

Ok, great! Thanks! I will not use -P for now.

@TomKellyGenetics
Copy link
Member

TomKellyGenetics commented Oct 27, 2020

I've run into a similar issue and managed to fix it. The problem is that preseq requires a BED file as input. See here in the bedtools on how to support paired-end files.

# sort BAM file
samtools sort -O BAM 6-10A_S13_L001_R1_001_val_1_bismark_bt2_pe.bam > 6-10A_S13_L001_R1_001_val_1_bismark_bt2_pe.sorted.bam
# index  BAM file
samtools index 6-10A_S13_L001_R1_001_val_1_bismark_bt2_pe.sorted.bam
# convert to BED file with paired-ends (BEDPE format)
bamToBed -i 6-10A_S13_L001_R1_001_val_1_bismark_bt2_pe.sorted -bedpe >  6-10A_S13_L001_R1_001_val_1_bismark_bt2_pe.sorted.bed
# run in paired-end mode with -P
preseq lc_extrap -v -P 6-10A_S13_L001_R1_001_val_1_bismark_bt2_pe.sorted.bed -o 6-10A_S13_L001_R1_001_val_1_bismark_bt2_pe.sorted.ccurve.txt

Possibly addresses issue #161. Note that I'm using preseq version 3.0.2 installed from GitHub smithlabcode/preseq. It's possible that newer versions (since this repo uses version 2.0.3) address these problems.

@Rohit-Satyam
Copy link

Just want to add that this issue has not been resolved yet. I am using 3.1 version and still facing the same issue when trying to run Preseq on Iseq run. I know the number of reads sequenced on Iseq are low and so will be the mapping reads in bam file. But it should run without error irrespective of that or fail quietly without disrupting nextflow pipeline.

@Nitin123-4
Copy link

I am also facing the same issue even with bed file. Both bam and bed gives the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants