Releases: MontgomeryLab/tinyRNA
v1.5.0
What's Changed
- The default Python version has been upgraded to 3.10, and likewise for dependencies. A new script has been added for developers which helps automate Conda environment dependency resolution, vulnerability scans among packages, and the Conda tricks we've leveraged for performing fast, consistent installations. #307
- Diagnostic alignment tables, which are produced by tiny-count when the
--report-diags
flag is provided, now include columns for the sequence's raw count, genomic hits, alignment mismatches vs. reference, and feature aliases. #309 - Pipeline auto-documentation has been significantly improved in backward-compatible fashion. #312
Full Changelog: v1.4.0...v1.5.0
v1.4.0
What's Changed
tiny-count
- A new selector has been added to the Features Sheet to allow for selecting reads based on their edit distance to the reference. It is evaluated during Stage 2 selection. #298
- BAM files are a supported input filetype. SAM and BAM file headers are checked during each run to ensure they report compatible ordering. Additionally, the Samples Sheet is more rigorously validated during standalone runs. #304
- Normalization by genomic hits can be disabled, independently or in tandem with normalization by feature hits. Regardless, the collected stats are validated for internal consistency after all samples have been counted. #301
- When using diagnostic options, intermediate alignment tables are written in CSV format and include new useful information #300
tiny-plot
- Features are included in DGE scatter plots if they have a count of 0 in one of the two conditions being compared. They are shown at the very edge of the plot space as half circles. Previously they were omitted due to the log scale's singularity at 0. Features that have a count of 0 in both conditions are still omitted. In DGE class scatter plots, classes consisting entirely of 0 counts in both conditions are still omitted from the legend. #305
Misc.
- Some Run Config options have been removed due to redundancy and deprecation: Bowtie's
trim5
andtrim3
options, and tiny-count'scounter_all_features
option #305
Full Changelog: v1.3.0...v1.4.0
v1.3.0
What's Changed
❗ = changes that present issues with backward compatibility
tiny-count
- GFF files are no longer required, and if they aren't provided then sequence-based counting will be performed rather than feature-based counting. Stage 2 and 3 selection still takes place in this mode, and counts are still optionally subset by classifier. #279 #287
- Shift parameters can be provided with overlap selectors. These parameters shift the 5' and/or 3' ends of matching feature intervals by the specified number of bases. #280
- ❗ Anchored overlap selectors (all three) require that the non-anchored end of the alignment is nested within the feature's interval for a match #282
- Wildcard values can be provided for overlap selectors. This is functionally equivalent to specifying
partial
. Thefull
selector has also been renamed tonested
. #282 - A tutorial for tiny-count has been added. #272
tiny-plot
- Class names in DGE scatter plots and class charts are sorted #286
- Custom min and/or max view limits for DGE scatter plots can be specified in the Run Config #273
- The "%" character has been removed from tick labels in
class_charts
andrule_charts
#284 - Various bugfixes and reliability improvements #273
Misc
- Log files for workflow steps are placed in the logs subdirectory for reach run, regardless of the run's success/failure. This makes troubleshooting significantly easier. #276
- A backward compatibility system has been introduced so that older Run Config files can be used if the user doesn't want to update them manually. The input file is left as-is but parameter additions/renames/deletions are automatically applied to the processed Run Config to bring it up to date. This is only supported for Run Configs from v1.2.0 and newer. #276
- A new bowtie option
-m
has been added to the Run Config for specifying the drop threshold for reads with too many alignments #284 - The ellipsis character, which Microsoft Excel autocorrects from three periods and saves using an obscure encoding format, no longer crashes the CSV reader. Empty rows are also skipped. #290
(tiny|tiny-count) --get-templates
will avoid configuration conflicts by only copying files if none of the relevant filenames are present in the CWD. #272- The workflow runner shows longer descriptive argument flags in its output for tiny-* utilities #291
v1.2.1
What's Changed
- Documentation has been added for installation of tiny-count via bioconda #257
tiny-count --get-templates
can be used to obtain template copies of configuration files relevant to tiny-count #259- Inclusive or exclusive filters for classes in DGE scatter plots can be specified in the Run Config and by command line #264
- Scatter plot tick label placement is more flexible and reliable across a much wider range of plot view limits #269
- Control conditions that contain forbidden characters (per R) are properly handled in tiny-deseq.r #267
- Group names in the Samples Sheet are validated to ensure that their "syntactically correct" translations (per R) do not cause namespace collisions #268
- The help string for top-level tinyRNA commands is much more helpful #263
Full Changelog: v1.2.0_patch1...v1.2.1
v1.2.0
This release improves the utility and user experience of selection rules and adds new, useful selectors to Stage 1 selection.
What's Changed
❗ = changes that present issues with backward compatibility
Pipeline Changes
- ❗ Tagged counting has been repurposed as a classifier which can be used to subclassify features during Stage 1 selection #241
- ❗ GFF files and aliases are now defined in the Paths File. This leaves the Features Sheet in a more consistent state since these columns operated independently from the rest of each row's rule definition. #245
- ❗ GFF Source and Type Filters are now included in Stage 1 selection, complete with wildcard support, and can be specified in the Features Sheet on a per-rule basis (rather than the previous global definition in the Run Config) #246
- The GFF validator can now use gzipped reference genomes for chromosome identifier validation #251
- The Features Sheet in the START_HERE directory has been updated to utilize the new selection format detailed above #249
- Misc. v1.2 prep #253 :
- Bugfix: non-differentially expressed features are misrepresented in scatter_dge_class plots under certain conditions
- Bugfix: the size of an empty StepVector differs between the HTSeq StepVector and our Cython StepVector, meaning that tiny-count runs aren't handled properly when there are no Stage 1 matches (this should be an error)
- Bugfix: changes to the Paths File between recount/replot runs are not reflected
- Bugfix: cwltool issues a notice of duplicate parameter names in tiny-plot.cwl
- Add a version parameter to the Run Config so that processed run configs are automatically updated with the tinyRNA version that was used
- Add Matplotlib documentation link in the .mplstyle template stylesheet
- Add the keyword "any" to the list of wildcard keywords permitted in selection rules
- Add a diagram that demonstrates Stage 1-3 selection to the documentation
- Add documentation notes about Run Directory files that can be safely removed to reduce storage usage
- Update version number in setup.py
- Update TUTORIAL.md with corrected line number references for paths.yml
Patch 11/29: additional changes for v1.2.0 release
- setup.py has been modified in preparation for tiny-count standalone installation via bioconda
- Resume Run Config files are now timestamped
- Default plot styles have been changed
Full Changelog: v1.1.0...v1.2.0_patch1
v1.1.0
This release brings improvements in performance, reliability, and compatibility.
What's Changed
❗ = changes that may present issues with backward compatibility
Pipeline changes
- All Conda dependencies have been updated, including Python (3.7 to 3.9), bowtie, fastp, matplotlib, etc. R dependencies are now managed by Conda which has removed lengthy build steps from the installation script. #214
- GFF annotations are validated at both pipeline and tiny-count startup. These changes also bring expanded support for feature ID attributes (by priority:
ID
,gene_id
, andParent
) and annotations defining whole chromosomes for compatibility with Ensembl. #236 - Samples Sheet contents are validated at pipeline startup #243
- ❗ Large bowtie indexes (*.ebwtl) are now supported. The activation steps for bowtie-build have also been simplified. #238
tiny-count changes:
- A custom Cython implementation of HTSeq's StepVector has been introduced. As a result, tiny-count runtimes are up to 50% faster (depending on configuration). HTSeq's StepVector is still used as a fallback if there are issues during build or import, or via user preference in the Run Config. Note: prebuilt binaries are not provided in this release. #218
- The semantics of the Features Sheet Hierarchy value have changed. Instead of being used as a means of candidate elimination in Stage 2 selection, it is now used to sort Stage 2 matches as a priority order for Stage 3. #229
- Features are no longer required to be stranded. Unstranded features that match rules with
3'/5' anchored
overlap selectors will be downgraded to a newanchored
overlap selector which does not distinguish between 3'/5' ends. Unstranded features will match all strand selectors (sense, antisense, and both) #236 - Features that list more than one value for their ID attribute are now accepted. These values are concatenated to form a single ID string. #239
- Non-collapsed SAM files, as well as SAM files produced from fastx collapsed outputs, are now accepted by tiny-count. #217
- ❗ The required command line argument for the Features Sheet has been renamed. #217
- GFF parsing performance has been improved. #218
- Gzipped tiny-collapse outputs no longer lead to a crash during Summary Stats creation #222
tiny-deseq.r
- Excessive decimal precision has been fixed in the pvalue and padj columns of DGE tables. Empty cells are now filled with NA. #232
tiny-plot
- Class color assignment is now consistent across comparisons, and legend items are sorted alphabetically in the
sample_avg_scatter_by_dge_class
plot type. Color mapping now supports up to 20 distinct class colors (unless a stylesheet with >20 colors is provided by the user). #220 - Plot outputs are now organized into subdirectories by type. #239
Full Changelog: v1.0.1...v1.1.0
v1.0.1
We are excited to announce the release of tinyRNA v1. This version marks the end of the initial development period and the first version intended for wider public use and evaluation. We would like to thank our beta testing community for their thoughtful and generous feedback, and the National Institutes of Health for their support.
v1.0.1
- Addresses installation issues related to Miniconda if the host environment requires a Conda installation
- A fixed version of Miniconda is installed, if needed, rather than the latest version