Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VCF/BCF Writer - auto-recognize format and compression based on file extension #307

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions src/bcf/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -663,6 +663,50 @@ impl Writer {
Err(Error::NonUnicodePath)
}
}

/// Create a new writer that writes to the given path.
///
/// The compression and format will be determined from the file extension:
///
/// * `.vcf` -> uncompressed VCF
/// * `.vcf.gz`. -> compressed VCF
/// * `.vcf.gzip` -> compressed VCF
/// * `.vcf.bgzf` -> compressed VCF
/// * `.bcf` -> compressed BCF
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

///
/// # Arguments
///
/// * `path` - the path
/// * `header` - header definition to use
pub fn from_path2<P: AsRef<Path>>(
path: P,
header: &Header
) -> Result<Self> {
if let Some(p) = path.as_ref().to_str() {
let fields: Vec<&str> = p.split(".").collect();
let length = fields.len();
// FIXME: check that we have enough fields
let uncompressed: bool = {
if (fields[length-1] == "gz" || fields[length-1] == "gzip" || fields[length-1] == "bgzf") { false }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to @tedil's recommendation, also have a look at the match statement for a more idiomatic construct when checking these different extensions:
https://doc.rust-lang.org/rust-by-example/flow_control/match.html

else if (fields[length-1] == "bcf") { false }
else if (fields[length-1] == "vcf") { true }
else { true } // FIXME
Comment on lines +690 to +693
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd try to use the extension method of Path. It returns Option<&OsStr>, which is a bit cumbersome to use since it's no "regular" String/&str, but can be transformed into one, see here.

};
let format: Format = {
if (uncompressed) {
if (fields[length-1] == "vcf") { Format::VCF }
else { Format::VCF } // FIXME: should we default this or return an error?
}
else if (fields[length-1] == "bcf") { Format::BCF } // always compressed
else if (fields[length-2] == "vcf") { Format::VCF }
else { Format::VCF } // FIXME: should we default this or return an error?
};
Ok(Self::new(p.as_bytes(), header, uncompressed, format)?)
} else {
Err(Error::NonUnicodePath)
}
}


/// Create a new writer from a URL.
///
Expand Down