VCFRecord
disable copying to prevent double-free (which should not come up except when writeln'ing) dtor
add INFO or FORMAT key:value pairs to a record add a single datapoint OR vector of values, OR, values to each sample (if tagType == FORMAT)
Add a filter; from htslib: "If flt_id is PASS, all existing filters are removed first. If other than PASS, existing PASS is removed."
Update FORMAT (sample info; column 9+) * * Templated on data type, calls one of bc_update_format_{int32,float,string,flag}
Update FORMAT (sample info; column 9+) * * Templated on data type, calls one of bc_update_format_{int32,float,string,flag}
Append an ID (column 3) to the record. NOTE: htslib performs duplicate checking
Update INFO (pan-sample info; column 8) * * Add a tag:value to the INFO column * NOTE: tag must already exist in the header * * Templated on data type, calls one of bcf_update_info_{int32,float,string,flag} * Both singletons and arrays are supported.
Set alleles; comma-separated list
Set alleles; array
All alleles getter (array)
Alternate alleles getter version 1: ["A", "ACTG", ...]
Alternate alleles getter version 2: "A,ACTG,..."
Get chromosome (CHROM)
Set chromosome (CHROM)
Get FILTER column (nothing in htslib sadly)
Set the FILTER column to f
Set the FILTER column to f0,f1,f2... TODO: determine definitiely whether "." is replaced with "PASS"
Determine whether FILTER is present. log warning if filter does not exist. "PASS" and "." can be used interchangeably.
Get ID string
Sets new ID string; comma-separated list allowed but no dup checking performed
Get position (POS, column 2) * * NB: internally BCF is uzing 0 based coordinates; we only show +1 when printing a VCF line with toString (which calls vcf_format)
Set position (POS, column 2)
Get variant quality (QUAL, column 6)
Set variant quality (QUAL, column 6)
Reference allele getter
REF allele length
Remove all entries in FILTER
Remove a filter by name
Remove a filter by numeric id
Set alleles; alt can be comma separated
Set alleles; min. 2 alleles (ref, alt1); unlimited alts may be specified
Set REF allele only param r is \0-term Cstring TODO: UNTESTED
Return a string representation of the VCFRecord (i.e. as would appear in .vcf) As a bonus, there is a kstring_t memory leak
all
all shared information (BCF_UN_STR|BCF_UN_FLT|BCF_UN_INFO)
BCF_UN_STR / BCF_UN_FLT | / BCF_UN_INFO | | / ____________________________ BCF_UN_FMT V V V / | | | #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003 ...
Wrapper around bcf1_t
Because it uses bcf1_t internally, it must conform to the BCF2 part of the VCFv4.2 specs, rather than the loosey-goosey VCF specs. i.e., INFO, CONTIG, FILTER records must exist in the header.
TODO: Does this need to be kept in a consistent state? Ideally, VCFWriter would reject invalid ones, but we are informed that it is invalid (e.g. if contig not found) while building this struct; bcf_write1 will actually segfault, unfortunately. I'd like to avoid expensive validate() calls for every record before writing if possible, which means keeping this consistent. However, not sure what to do if error occurs when using the setters herein?
2019-01-23 struct->class to mirror SAMRecord -- faster if reference type?
2019-01-23 WIP: getters for chrom, pos, id, ref, alt are complete (untested)
After parsing a BCF or VCF line, bcf1_t must be unpacked. (not applicable when building bcf1_t from scratch) Depending on information needed, this can be done to various levels with performance tradeoff. Unpacking symbols: