For backwards compatibility
Iterator with multiple regions * typedef hts_itr_multi_t *hts_itr_multi_query_func(const hts_idx_t *idx, hts_itr_multi_t *itr);
Compression type
Specific format (SAM, BAM, CRAM, BCF, VCF, TBI, BED, etc.)
File I/O * Broad format category (sequence data, variant data, index, regions, etc.)
Mostly CRAM only, but this could also include other format options
REQUIRED_FIELDS
???
???
! @abstract Determine whether a given htsFile contains a valid EOF block @return 3 for a non-EOF checkable filetype; 2 for an unseekable file type where EOF cannot be checked; 1 for a valid EOF block; 0 for if the EOF marker is absent when it should be present; -1 (with errno set) on failure @discussion Check if the BGZF end-of-file (EOF) marker is present
! @abstract Close a file handle, flushing buffered data for output streams @param fp The file handle to be closed @return 0 for success, or negative if an error occurred.
! @abstract Determine format by peeking at the start of a file @param fp File opened for reading, positioned at the beginning @param fmt Format structure that will be filled out on return @return 0 for success, or negative if an error occurred.
! @abstract Get a human-readable description of the file format @param fmt Format structure holding type, version, compression, etc. @return Description string, to be freed by the caller after use.
! @ abstract Returns a string containing the file format extension. @ param format Format structure containing the file type. @ return A string ("sam", "bam", etc) or "?" for unknown formats.
! @abstract Returns the file's format information @param fp The file handle @return Read-only pointer to the file's htsFormat.
?Get line as string from line-oriented flat file (undocumented in hts.h)
! @abstract Open an existing stream as a SAM/BAM/CRAM/VCF/BCF/etc file @param fn The already-open file handle @param mode Open mode, as per hts_open()
Destroy index
?finalize index
@param idx The index @param l_meta Pointer to where the length of the extra data is stored @return Pointer to the extra data if present; NULL otherwise
Get number of elements with no coordinate (unmapped?) from an index
Get statistics(?) from an index (number of mapped and unmapped for a given contig/tid)
Initialize index
@param fn BAM/BCF/etc filename, to which .bai/.csi/etc will be added or the extension substituted, to search for an existing index file @param fmt One of the HTS_FMT_* index formats @return The index, or NULL if an error occurred.
@param fn Input BAM/BCF/etc filename @param fnidx The input index filename @return The index, or NULL if an error occurred.
Add to index
@param idx Index to be written @param fn Input BAM/BCF/etc filename, to which .bai/.csi/etc will be added @param fmt One of the HTS_FMT_* index formats @return 0 if successful, or negative if an error occurred.
@param idx Index to be written @param fn Input BAM/BCF/etc filename @param fnidx Output filename, or NULL to add .bai/.csi/etc to @a fn @param fmt One of the HTS_FMT_* index formats @return 0 if successful, or negative if an error occurred.
return C-string array of sequence names. NB: free only the array, not the values.
@param idx The index @param l_meta Length of data @param meta Pointer to the extra data @param is_copy If not zero, a copy of the data is taken @return 0 on success; -1 on failure (out of memory).
destroy iterator
BAM multi iterator
CRAM multi iterator
multi iterator: free
multi iterator: next
iterator next
iterator query function (by integer tid/start/end)
iterator query function (by string "chr:start-end")
? multi iterator by regionlist ?
! @abstract Open a SAM/BAM/CRAM/VCF/BCF/etc file @param fn The file name or "-" for stdin/stdout @param mode Mode matching / rwa[bceguxz0-9]* / @discussion With 'r' opens for reading; any further format mode letters are ignored as the format is detected by checking the first few bytes or BGZF blocks of the file. With 'w' or 'a' opens for writing or appending, with format specifier letters: b binary format (BAM, BCF, etc) rather than text (SAM, VCF, etc) c CRAM format g gzip compressed u uncompressed z bgzf compressed [0-9] zlib compression level and with non-format option letters (for any of 'r'/'w'/'a'): e close the file on exec(2) (opens with O_CLOEXEC, where supported) x create the file exclusively (opens with O_EXCL, where supported) Note that there is a distinction between 'u' and '0': the first yields plain uncompressed output whereas the latter outputs uncompressed data wrapped in the zlib format. @example rwb .. compressed BCF, BAM, FAI rwbu .. uncompressed BCF rwz .. compressed VCF rw .. uncompressed VCF
! @abstract Open a SAM/BAM/CRAM/VCF/BCF/etc file @param fn The file name or "-" for stdin/stdout @param mode Open mode, as per hts_open() @param fmt Optional format specific parameters @discussion See hts_open() for description of fn and mode. // TODO Update documentation for s/opts/fmt/ Opts contains a format string (sam, bam, cram, vcf, bcf) which will, if defined, override mode. Opts also contains a linked list of hts_opt structures to apply to the open file handle. These can contain things like pointers to the reference or information on compression levels, block sizes, etc.
Parses arg and appends it to the option list.
Applies an hts_opt option list to a given htsFile.
Frees an hts_opt list.
The number may be expressed in scientific notation, and optionally may contain commas in the integer part (before any decimal point or E notation). @param str String to be parsed @param strend If non-NULL, set on return to point to the first character in @a str after those forming the parsed number @param flags Or'ed-together combination of HTS_PARSE_* flags @return Converted value of the parsed number.
Accepts a string file format (sam, bam, cram, vcf, bam) optionally followed by a comma separated list of key=value options and splits these up into the fields of htsFormat struct.
Tokenise options as (key(=value)?,)*(key(=value)?)? NB: No provision for ',' appearing in the value! Add backslashing rules?
@param str String to be parsed @param beg Set on return to the 0-based start of the region @param end Set on return to the 1-based end of the region @return Pointer to the colon or '\0' after the reference sequence name, or NULL if @a str could not be parsed.
?Get _n lines into buffer from line-oriented flat file; sets _n as number read (undocumented in hts.h)
! @abstract Parse comma-separated list or read list from a file @param list File name or comma-separated list @param is_file @param _n Size of the output array (number of items read) @return NULL on failure or pointer to newly allocated array of strings
free regionlist
! @abstract Adds a cache of decompressed blocks, potentially speeding up seeks. This may not work for all file types (currently it is bgzf only). @param fp The file handle @param n The size of cache, in bytes
! @abstract Set .fai filename for a file opened for reading @return 0 for success, negative on failure @discussion Called before *_hdr_read(), this provides the name of a .fai file used to provide a reference list if the htsFile contains no @SQ headers.
! @abstract Sets a specified CRAM option on the open file handle. @param fp The file handle open the open file. @param opt The CRAM_OPT_* option. @param ... Optional arguments, dependent on the option used. @return 0 for success, or negative if an error occurred.
! @abstract Create extra threads to aid compress/decompression for this file @param fp The file handle @param p A pool of worker threads, previously allocated by hts_create_threads(). @return 0 for success, or negative if an error occurred.
! @abstract Create extra threads to aid compress/decompression for this file @param fp The file handle @param n The number of worker threads to create @return 0 for success, or negative if an error occurred. @notes This function creates non-shared threads for use solely by fp. The hts_set_thread_pool function is the recommended alternative.
! @abstract Get the htslib version number @return For released versions, a string like "N.N.N"; or git describe output if using a library built within a Git repository.
hts_file_type() - Convenience function to determine file type
hts_file_type() - Convenience function to determine file type
* Indexing * ************//// iterates over unmapped reads sorted at the end of the fil
always returns "no more alignment records"
iterates from the current position to the end of the file
iterates over the entire file
< Ignore ',' separators within numbers
! @abstract Table for converting a 4-bit encoded nucleotide to an IUPAC ambiguity code letter (or '=' when given 0).
index data (opaque)
? index key
see cram.h, sam.h, sam.d
Data and metadata for an hts file; part of public and private ABI
hts file complete file format information
A combined thread pool and queue allocation size. The pool should already be defined, but qsize may be zero to indicate an appropriate queue size is taken from the pool.
multi iterator
iterator
Options for cache, (de)compression, threads, CRAM, etc.
32-bit start/end coordinate pair
64-bit start, end coordinate pair tracking max (internally used in hts.c)
64-bit start/end coordinate pair
Region list used in iterators (NB: apparently confined to single contig/tid)
see thread_pool.d
BAM index (old)
CRAM index (not sure if superceded by CSI?)
coordinate-sorted index (new)
Tabix index
! @abstract Table for converting a 4-bit encoded nucleotide to about 2 bits. Returns 0/1/2/3 for 1/2/4/8 (i.e., A/C/G/T), or 4 otherwise (0 or ambiguous).
! @abstract Table for converting a nucleotide character to 4-bit encoding. The input character may be either an IUPAC ambiguity code, '=' for 0, or '0'/'1'/'2'/'3' for a result of 1/2/4/8. The result is encoded as 1/2/4/8 for A/C/G/T or combinations of these bits for ambiguous bases.