Complexity curve #96

bazyliszek · 2019-05-21T09:20:44Z

Looking into complexity curve. Could it be that for PE reads we actually need to add the -P parameter in preseq? This was not detected in the pipeline automatically.

samtools sort 6-10A_S13_L001_R1_001_val_1_bismark_bt2_pe.bam \
         -m 8589934592 \
         -@ 1 \
         -o 6-10A_S13_L001_R1_001_val_1_bismark_bt2_pe.sorted.bam
     preseq lc_extrap -v -B 6-10A_S13_L001_R1_001_val_1_bismark_bt2_pe.sorted.bam -o 6-10A_S13_L001_R1_001_val_1_bismark_bt2_pe.ccurve.txt

Also, for very small amounts of the reads this program is not working, but I guess that is my problem with reads of total number 119,573. Error: max count before zero is less than min required cound (4), sample not sufficiently deep or duplicates removed.

original command:

BAM_INPUT
TOTAL READS     = 119573
DISTINCT READS  = 119456
DISTINCT COUNTS = 3
MAX COUNT       = 3
COUNTS OF 1     = 119342
MAX TERMS       = 2
OBSERVED COUNTS (4)
1       119342
2       111
3       3

ERROR:  max count before zero is les than min required count (4), sample not sufficiently deep or duplicates removed

and if I implement PE:

PAIRED_END_BAM_INPUT
paired = 119572
unpaired = 0
MERGED PAIRED END READS = 119572
MATES PROCESSED = 239144
TOTAL READS     = 119572
DISTINCT READS  = 119485
DISTINCT COUNTS = 2
MAX COUNT       = 2
COUNTS OF 1     = 119398
MAX TERMS       = 2
OBSERVED COUNTS (3)
1       119398
2       87

ERROR:  max count before zero is les than min required count (4), sample not sufficiently deep or duplicates removed

The text was updated successfully, but these errors were encountered:

ewels · 2019-05-22T14:09:05Z

Ooh, yes - this goes a long way back. Basically, running with -P makes preseq fail a lot. This became very frustrating so I just removed it and made it run in single-end mode all of the time. The shapes of the curves are still correct, but I agree that it's not very clear and should be improved.

The problem with it failing with low read counts is just down to preseq. Nothing that we can do about that from the pipeline I think sorry.

bazyliszek · 2019-05-28T07:27:19Z

Ok, great! Thanks! I will not use -P for now.

TomKellyGenetics · 2020-10-27T05:14:53Z

I've run into a similar issue and managed to fix it. The problem is that preseq requires a BED file as input. See here in the bedtools on how to support paired-end files.

# sort BAM file
samtools sort -O BAM 6-10A_S13_L001_R1_001_val_1_bismark_bt2_pe.bam > 6-10A_S13_L001_R1_001_val_1_bismark_bt2_pe.sorted.bam
# index  BAM file
samtools index 6-10A_S13_L001_R1_001_val_1_bismark_bt2_pe.sorted.bam
# convert to BED file with paired-ends (BEDPE format)
bamToBed -i 6-10A_S13_L001_R1_001_val_1_bismark_bt2_pe.sorted -bedpe >  6-10A_S13_L001_R1_001_val_1_bismark_bt2_pe.sorted.bed
# run in paired-end mode with -P
preseq lc_extrap -v -P 6-10A_S13_L001_R1_001_val_1_bismark_bt2_pe.sorted.bed -o 6-10A_S13_L001_R1_001_val_1_bismark_bt2_pe.sorted.ccurve.txt

Possibly addresses issue #161. Note that I'm using preseq version 3.0.2 installed from GitHub smithlabcode/preseq. It's possible that newer versions (since this repo uses version 2.0.3) address these problems.

Rohit-Satyam · 2023-01-10T07:56:20Z

Just want to add that this issue has not been resolved yet. I am using 3.1 version and still facing the same issue when trying to run Preseq on Iseq run. I know the number of reads sequenced on Iseq are low and so will be the mapping reads in bam file. But it should run without error irrespective of that or fail quietly without disrupting nextflow pipeline.

Nitin123-4 · 2023-03-03T22:46:16Z

I am also facing the same issue even with bed file. Both bam and bed gives the same issue.

ewels added the question Further information is requested label May 22, 2019

bazyliszek closed this as completed May 28, 2019

TomKellyGenetics reopened this Oct 27, 2020

sruthipsuresh mentioned this issue Jan 5, 2021

Add tests for preseq nf-core/modules#99

Closed

ewels mentioned this issue Nov 3, 2022

Preseq failing most of the time #161

Open

bounlu mentioned this issue Apr 3, 2024

ERROR: max count before zero is less than min required count (4) duplicates removed smithlabcode/preseq#71

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Complexity curve #96

Complexity curve #96

bazyliszek commented May 21, 2019 •

edited by ewels

Loading

ewels commented May 22, 2019

bazyliszek commented May 28, 2019

TomKellyGenetics commented Oct 27, 2020 •

edited

Loading

Rohit-Satyam commented Jan 10, 2023

Nitin123-4 commented Mar 3, 2023

Complexity curve #96

Complexity curve #96

Comments

bazyliszek commented May 21, 2019 • edited by ewels Loading

ewels commented May 22, 2019

bazyliszek commented May 28, 2019

TomKellyGenetics commented Oct 27, 2020 • edited Loading

Rohit-Satyam commented Jan 10, 2023

Nitin123-4 commented Mar 3, 2023

bazyliszek commented May 21, 2019 •

edited by ewels

Loading

TomKellyGenetics commented Oct 27, 2020 •

edited

Loading