-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UMI FASTQ file #703
base: modules
Are you sure you want to change the base?
UMI FASTQ file #703
Conversation
UMI FASTQ file composed of random 9bp synthetic oligos, all with uniform quality. Created synthetically to match existing UMI fastq file(s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi :)
could you please describe a little more this file?
if this is the use case where UMIs are present in a third FASTQ, then the test dataset should include 3 files: forward and reverse (without UMIs in the sequence), and a UMIs file.
also, UMIs structure is needed in order to process the sequences |
Yep no problem. The entire read is the UMI sequence, it matches the existing FASTQs that are in the repository. Here is the existing FASTQ files:
and here is the new one:
As you can see, the UMI FASTQ file matches the existing FASTQ files, saving us some storage. I generated the FASTQs by:
I'll upload the script later today and update here. I've checked the method and it seems to work fine in our pipeline. The bases mask is |
I've just checked your development branch, and I think the syntax would be: |
This means it will have the same UMI sequences.
Slight change - I've extracted those first 12bp and put them in that FASTQ file. This now should have exactly the same UMI sequences as the existing FASTQ and should create almost identical consensus reads.
@lescai I've checked your subworkflow in development and it already works with three FASTQ files nicely! We just have to add an additional test. |
@lescai did you have a chance to check this? |
UMI FASTQ file
composed of random 9bp synthetic oligos, all with uniform quality.Generated by stripping the UMI sequence from the existing FASTQ and turning it into a separate file. This will be a valid reference format for sequencing kits where the UMI is embedded in the index.