These macros are defined only for consistency with other parts of htslib
BCF index
bcf_open and vcf_open mode: please see hts_open() in hts.h
bcf_add_filter() - adds to the FILTER column @flt_id: filter ID to add, numeric ID returned by bcf_hdr_id2int(hdr, BCF_DT_ID, "PASS")
Conversion between alleles indexes to Number=G genotype index (assuming diploid, all 0-based)
Make the bcf1_t object ready for next read. Intended mostly for internal use, the user should rarely need to call this function directly.
@param p Pointer to input data block. @param type One of the BCF_BT_INT* type codes @paramout q Location to store an updated value for p @return The integer value, or zero if @p type is not valid.
@param p Pointer to input data block. @paramout q Location to store an updated value for p @return The integer value, or zero if the type code was not valid.
Deallocate a bcf1_t object
Same as bcf_destroy() but frees only the memory allocated by bcf1_t, not the bcf1_t object itself.
Undocumented Encode integer variant 1
Undocumented Encode integer type?
@param s kstring to write into @param l length of input @param a input data to encode @return 0 on success; < 0 on error
@param s kstring to write into @param n total number of items in @p a (<= 0 to encode BCF_BT_NULL) @param a input data to encode @return 0 on success; < 0 on error
@param s kstring to write into @param n total number of items in @p a (<= 0 to encode BCF_BT_NULL) @param a input data to encode @param wsize vector length (<= 0 is equivalent to @p n) @return 0 on success; < 0 on error @note @p n should be an exact multiple of @p wsize
u wot
u wot
float vector macros
float vector macros
@param s kstring to write into @param n number of items in @p data @param type type of items in @p data @param data BCF format data @return 0 on success -1 if out of memory
(Undocumented) Format GT field
bcf_get_fmt() - returns pointer to FORMAT's field data @header: for access to BCF_DT_ID dictionary @line: VCF line obtained from vcf_parse1 @fmt: one of GT,PL,...
bcf_get_*_id() - returns pointer to FORMAT/INFO field data given the header index instead of the string ID @line: VCF line obtained from vcf_parse1 @id: The header index for the tag, obtained from bcf_hdr_id2int()
bcf_get_info_*() - get INFO values, integers or floats @param hdr: BCF header @param line: BCF record @param tag: INFO tag to retrieve @param dst: *dst is pointer to a memory location, can point to NULL @param ndst: pointer to the size of allocated memory @return >=0 on success -1 .. no such INFO tag defined in the header -2 .. clash between types defined in the header and encountered in the VCF record -3 .. tag is not present in the VCF record -4 .. the operation could not be completed (e.g. out of memory)
@param hdr: BCF header @param line: BCF record @param tag: INFO tag to retrieve @param dst: *dst is pointer to a memory location, can point to NULL @param ndst: pointer to the size of allocated memory @return >=0 on success -1 .. no such INFO tag defined in the header -2 .. clash between types defined in the header and encountered in the VCF record -3 .. tag is not present in the VCF record -4 .. the operation could not be completed (e.g. out of memory)
bcf_get_variant_types - returns one of VCF_REF, VCF_SNP, etc
Conversion between alleles indexes to Number=G genotype index (assuming diploid, all 0-based)
Macros for setting genotypes correctly, for use with bcf_update_genotypes only; idx corresponds to VCF's GT (1-based index to ALT or 0 for the reference allele) and val is the opposite, obtained from bcf_get_genotypes() below.
Macros for setting genotypes correctly, for use with bcf_update_genotypes only; idx corresponds to VCF's GT (1-based index to ALT or 0 for the reference allele) and val is the opposite, obtained from bcf_get_genotypes() below.
Returns 1 if present, 0 if absent, or -1 if filter does not exist. "PASS" and "." can be used interchangeably.
bcf_hdr_add_sample() - add a new sample. @param sample: sample name to be added
Append new VCF header line, returns 0 on success
Copy header lines from src to dst if not already present in dst. See also bcf_translate(). Returns 0 on success or sets a bit on error: 1 .. conflicting definitions of tag length // todo
Destroy a BCF header struct
Create a new header using the supplied template
Returns formatted header (newly allocated string) and its length, excluding the terminating \0. If is_bcf parameter is unset, IDX fields are discarded. @deprecated Use bcf_hdr_format() instead as it can handle huge headers.
If _is_bcf_ is zero, IDX fields are discarded. @return 0 if successful, or negative if an error occurred @since 1.4
bcf_hdr_get_hrec() - get header line info @param type: one of the BCF_HL_* types: FLT,INFO,FMT,CTG,STR,GEN @param key: the header key for generic lines (e.g. "fileformat"), any field for structured lines, typically "ID". @param value: the value which pairs with key. Can be be NULL for BCF_HL_GEN @param str_class: the class of BCF_HL_STR line (e.g. "ALT" or "SAMPLE"), otherwise NULL
VCF version, e.g. VCFv4.2
bcf_hdr_id2*() - Macros for accessing bcf_idinfo_t @type: one of BCF_HL_FLT, BCF_HL_INFO, BCF_HL_FMT @int_id: return value of bcf_hdr_id2int, must be >=0
bcf_hdr_id2int() - Translates string into numeric ID bcf_hdr_int2id() - Translates numeric ID into string @type: one of BCF_DT_ID, BCF_DT_CTG, BCF_DT_SAMPLE @id: tag name, such as: PL, DP, GT, etc.
bcf_hdr_id2*() - Macros for accessing bcf_idinfo_t @type: one of BCF_HL_FLT, BCF_HL_INFO, BCF_HL_FMT @int_id: return value of bcf_hdr_id2int, must be >=0
bcf_hdr_name2id() - Translates sequence names (chromosomes) into numeric ID bcf_hdr_id2name() - Translates numeric ID to sequence name
bcf_hdr_id2*() - Macros for accessing bcf_idinfo_t @type: one of BCF_HL_FLT, BCF_HL_INFO, BCF_HL_FMT @int_id: return value of bcf_hdr_id2int, must be >=0
bcf_hdr_init() - create an empty BCF header. @param mode "r" or "w"
bcf_hdr_merge() - copy header lines from src to dst, see also bcf_translate() @param dst: the destination header to be merged into, NULL on the first pass @param src: the source header @return NULL on failure, header otherwise
bcf_hdr_name2id() - Translates sequence names (chromosomes) into numeric ID bcf_hdr_id2name() - Translates numeric ID to sequence name
Get number of samples
The following functions are for internal use and should rarely be called directly
bcf_hdr_parse_line() - parse a single line of VCF textual header @param h BCF header struct @param line One or more lines of header text @param len Filled out with length data parsed from 'line'. @return bcf_hrec_t* on success; NULL on error or on end of header text. NB: to distinguish error from end-of-header, check *len: *len == 0 indicates @p line did not start with "##" *len == -1 indicates failure, likely due to out of memory *len > 0 indicates a malformed header line
@param fp The file to read the header from @return Pointer to a populated header structure on success; NULL on failure
bcf_hdr_remove() - remove VCF header tag @param type: one of BCF_HL_* @param key: tag name or NULL to remove all tags of the given type
Creates a list of sequence names. It is up to the caller to free the list (but not the sequence names)
Read VCF header from a file and update the header
bcf_hdr_set_samples() - for more efficient VCF parsing when only one/few samples are needed @param samples samples to include or exclude from file or as a comma-separated string. LIST|FILE .. select samples in list/file ^LIST|FILE .. exclude samples from list/file - .. include all samples NULL .. exclude all samples @param is_file @p samples is a file (1) or a comma-separated list (0)
@param hdr BCF header struct @param version Version to set, e.g. "VCFv4.3" @return 0 on success; < 0 on error
bcf_hdr_subset() - creates a new copy of the header removing unwanted samples @param n: number of samples to keep @param samples: names of the samples to keep @param imap: mapping from index in @samples to the sample index in the original file @return NULL on failure, header otherwise
@param h Header @return 0 on success, -1 on failure
@param fp Output file @param h The header to write @return 0 on success; -1 on failure
@param hrec Header record @param str Key name @param len Length of @p str @return 0 on success; -1 on failure
@param hrec Header record
@param hrec Header record to copy @return A new header record on success; NULL on failure
Lookup header record by key
@param hrec Header record @param str Destination kstring @return 0 on success; < 0 on error
@param hrec Header record @param i Index of value @param str Value to set @param len Length of @p str @param is_quoted Value should be quoted @return 0 on success; -1 on failure
@param fp File handle for the data file being written. @param h BCF header structured (needed for BAI and CSI). @param min_shift CSI bin size (CSI default is 14). @param fnidx Filename to write index to. This pointer must remain valid until after bcf_idx_save is called. @return 0 on success, <0 on failure. @note This must be called after the header has been written, but before any other data.
@param fp File handle for the data file being written. @return 0 on success, <0 on failure.
bcf_index_build() - Generate and save an index file @fn: Input VCF(compressed)/BCF filename @min_shift: log2(width of the smallest bin), e.g. a value of 14 imposes a 16k base lower limit on the width of index bins. Positive to generate CSI, or 0 to generate TBI. However, a small value of min_shift would create a large index, which would lead to reduced performance when using the index. A recommended value is 14. For BCF files, only the CSI index can be generated.
bcf_index_build2() - Generate and save an index to a specific file @fn: Input VCF/BCF filename @fnidx: Output filename, or NULL to add .csi/.tbi to @fn @min_shift: Positive to generate CSI, or 0 to generate TBI
bcf_index_build3() - Generate and save an index to a specific file @fn: Input VCF/BCF filename @fnidx: Output filename, or NULL to add .csi/.tbi to @fn @min_shift: Positive to generate CSI, or 0 to generate TBI @n_threads: Number of VCF/BCF decoder threads
@param fn BCF file name @return The index, or NULL if an error occurred. @note This only works for BCF files. Consider synced_bcf_reader instead which works for both BCF and VCF.
@param fn Input BAM/BCF/etc filename @param fnidx The input index filename @return The index, or NULL if an error occurred. @note This only works for BCF files. Consider synced_bcf_reader instead which works for both BCF and VCF.
@param fn Input BAM/BCF/etc filename @param fnidx The input index filename @param flags Flags to alter behaviour (see description) @return The index, or NULL if an error occurred. @note This only works for BCF files. Consider synced_bcf_reader instead which works for both BCF and VCF.
Get a list (char **) of sequence names from the index -- free only the array, not the values
Allocate and initialize a bcf1_t object.
Iterate through the range r should (probably) point to your VCF (BCF) row structure TODO: attempt to define parameter r as bcf1_t *, which is what I think it should be
Generate an iterator for an integer-based range query
Generate an iterator for a string-based range query
@param fp The file to read the record from @param h The header for the vcf/bcf file @param v The bcf1_t structure to populate @return 0 on success; -1 on end of file; < -1 on critical error
Helper function for the bcf_itr_next() macro; internal use, ignore it
bcf_remove_filter() - removes from the FILTER column @flt_id: filter ID to remove, numeric ID returned by bcf_hdr_id2int(hdr, BCF_DT_ID, "PASS") @pass: when set to 1 and no filters are present, set to PASS
bcf_hdr_name2id() - Translates sequence names (chromosomes) into numeric ID bcf_hdr_id2name() - Translates numeric ID to sequence name
Return CONTIG name, or "(unknown)"
See the description of bcf_hdr_subset()
bcf_translate() - translate tags ids to be consistent with different header. This function is useful when lines from multiple VCF need to be combined. @dst_hdr: the destination header, to be used in bcf_write(), see also bcf_hdr_combine() @src_hdr: the source header, used in bcf_read() @src_line: line obtained by bcf_read()
bcf_update_alleles() and bcf_update_alleles_str() - update REF and ALT column @alleles: Array of alleles @nals: Number of alleles @alleles_string: Comma-separated alleles, starting with the REF allele
bcf_update_filter() - sets the FILTER column @flt_ids: The filter IDs to set, numeric IDs returned by bcf_hdr_id2int(hdr, BCF_DT_ID, "PASS") @n: Number of filters. If n==0, all filters are removed
bcf_update_id() - sets new ID string bcf_add_id() - adds to the ID string checking for duplicates
bcf_update_info_*() - functions for updating INFO fields @param hdr: the BCF header @param line: VCF line to be edited @param key: the INFO tag to be updated @param values: pointer to the array of values. Pass NULL to remove the tag. @param n: number of values in the array. When set to 0, the INFO tag is removed @return 0 on success or negative value on error.
@param hdr: the BCF header @param line: VCF line to be edited @param key: the INFO tag to be updated @param values: pointer to the array of values. Pass NULL to remove the tag. @param n: number of values in the array. When set to 0, the INFO tag is removed @return 0 on success or negative value on error.
@param fp The file to write to @param h The header for the vcf/bcf file @param v The bcf1_t structure to write @return 0 on success; -1 on error
@param hrec Header record @param idx IDX value to add @return 0 on success; -1 on failure
The opposite of vcf_parse. It should rarely be called directly, see vcf_write
@param fp The file to read the header from @return Pointer to a populated header structure on success; NULL on failure
@param fp Output file @param h The header to write @return 0 on success; -1 on failure
Complete the file opening mode, according to its extension. @param mode Preallocated mode string to be completed. @param fn File name to be opened. @param format Format string (vcf|bcf|vcf.gz) @return 0 on success; -1 on failure
Parse VCF line contained in kstring and populate the bcf1_t struct The line must not end with \n or \r characters.
@param fp The file to read the record from @param h The header for the vcf file @param v The bcf1_t structure to populate @return 0 on success; -1 on end of file; < -1 on error
@param fp The file to write to @param h The header for the vcf file @param v The bcf1_t structure to write @return 0 on success; -1 on error
@param line Line to write @param fp File to write it to @return 0 on success; -1 on failure
Allele(s) was edited
FILTER was edited
ID was edited
INFO was edited
char (8 bit)
float (32?)
int16
int32
Unofficial, for internal use only per htslib headers
int8
* VCF record * **************//// nul
dictionary type: CONTIG
dictionary type: ID
dictionary type: SAMPLE
BCF error:
BCF error:
BCF error: undefined contig
BCF error:
BCF error:
BCF error:
BCF error: undefined tag
header line: contig
* Header struct * *****************//// header line: FILTE
header line: FORMAT
header line: generic header line
header line: INFO
header line: structured header line TAG=<A=..,B=..>
header type: FLAG// header type
header type: INTEGER
header type: REAL
header type: STRING
bcf_unpack() - unpack/decode a BCF record (fills the bcf1_t::d field)
variable length: ?
variable length: fixed (?)// variable length
variable length: ?
variable length: ?
variable length: variable
INDEL
MNP
other (e.g. SV)
overlapping deletion, ALT=*
ref (e.g. in a gVCF)
SNP
Note that in contrast with BCFv2.1 specification, HTSlib implementation allows missing values in vectors. For integer types, the values 0x80, 0x8000, 0x80000000 are interpreted as missing values and 0x81, 0x8001, 0x80000001 as end-of-vector indicators. Similarly for floats, the value of 0x7F800001 is interpreted as a missing value and 0x7F800002 as an end-of-vector indicator. Note that the end-of-vector byte is not part of the vector.
Lookup table used in bcf_record_check MAINTAINER: in C header is []
The bcf1_t structure corresponds to one VCF/BCF line. Reading from VCF file is slower because the string is first to be parsed, packed into BCF line (done in vcf_parse), then unpacked into internal bcf1_t structure. If it is known in advance that some of the fields will not be required (notably the sample columns), parsing of these can be skipped by setting max_unpack appropriately. Similarly, it is fast to output a BCF line because the columns (kept in shared.s, indiv.s, etc.) are written directly by bcf_write, whereas a VCF line must be formatted in vcf_format.
Variable-length data from a VCF record
FORMAT field data (§1.4.2 Genotype fields)
Structured repreentation of VCF header (§1.2) Note that bcf_hdr_t structs must always be created via bcf_hdr_init()
Structured representation of a header line (§1.2)
ID Dictionary entry
ID Dictionary k/v
INFO field data (§1.4.1 Fixed fields, (8) INFO)
variant type record embedded in bcf_dec_t variant type and the number of bases affected, negative for deletions
Macros for setting genotypes correctly, for use with bcf_update_genotypes only; idx corresponds to VCF's GT (1-based index to ALT or 0 for the reference allele) and val is the opposite, obtained from bcf_get_genotypes() below.
@file htslib/vcf.h High-level VCF/BCF variant calling file operations. Section numbers refer to VCF Specification v4.2: https://samtools.github.io/hts-specs/VCFv4.2.pdf