You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Users who identify their sequences may contain paralogs and/or heterozygous sequences (#1) may be interested in extracting multiple sequences per gene for each sample. Recovery of the paralogous sequences is especially necessary for projects where some of the genes may have unknown duplication history. Users will want to build gene trees with multiple paralogs to identify where duplications took place and select orthologs for species-level phylogenetic analysis.
However, in HybPiper there are multiple contigs assembled and the retriever can extract sequences from each of them. With the overlap assembler, only one consensus sequence is made each time.
One idea is to use a workflow similar to the "alleles_workflow" I used in a 2018 AJB paper to phase heterozygous sites. The workflow uses BWA (map reads) Picard (to mark duplicate reads), GATK (call variants within individuals), and WhatsHap (to phase SNPs using read data). I then have a script (haplonerate.py) to extract phased FASTA sequences based on a user decision about what to do outside of the largest phased block.
Users who identify their sequences may contain paralogs and/or heterozygous sequences (#1) may be interested in extracting multiple sequences per gene for each sample. Recovery of the paralogous sequences is especially necessary for projects where some of the genes may have unknown duplication history. Users will want to build gene trees with multiple paralogs to identify where duplications took place and select orthologs for species-level phylogenetic analysis.
This is accomplished in HybPiper by https://github.com/mossmatters/HybPiper/blob/master/paralog_retriever.py
However, in HybPiper there are multiple contigs assembled and the retriever can extract sequences from each of them. With the overlap assembler, only one consensus sequence is made each time.
One idea is to use a workflow similar to the "alleles_workflow" I used in a 2018 AJB paper to phase heterozygous sites. The workflow uses BWA (map reads) Picard (to mark duplicate reads), GATK (call variants within individuals), and WhatsHap (to phase SNPs using read data). I then have a script (haplonerate.py) to extract phased FASTA sequences based on a user decision about what to do outside of the largest phased block.
https://github.com/mossmatters/phyloscripts/tree/master/alleles_workflow
The text was updated successfully, but these errors were encountered: