hts_parse_region

@param str String to be parsed @param tid Set on return (if not NULL) to be reference index (-1 if invalid) @param beg Set on return to the 0-based start of the region @param end Set on return to the 1-based end of the region @param getid Function pointer. Called if not NULL to set tid. @param hdr Caller data passed to getid. @param flags Bitwise HTS_PARSE_* flags listed above. @return Pointer to the byte after the end of the entire region specifier (including any trailing comma) on success, or NULL if @a str could not be parsed.

A variant of hts_parse_reg which is reference-id aware. It uses the iterator name2id callbacks to validate the region tokenisation works.

This is necessary due to GRCh38 HLA additions which have reference names like "HLA-DRB1*12:17".

To work around ambiguous parsing issues, eg both "chr1" and "chr1:100-200" are reference names, quote using curly braces. Thus "{chr1}:100-200" and "{chr1:100-200}" disambiguate the above example.

Flags are used to control how parsing works, and can be one of the below.

extern (C)
const(char)*
hts_parse_region

Detailed Description

HTS PARSE THOUSANDS SEP

Ignore commas in numbers. For example with this flag 1,234,567 is interpreted as 1234567.

HTS PARSE LIST

If present, the region is assmed to be a comma separated list and position parsing will not contain commas (this implicitly clears HTS_PARSE_THOUSANDS_SEP in the call to hts_parse_decimal). On success the return pointer will be the start of the next region, ie the character after the comma. (If *ret != '\0' then the caller can assume another region is present in the list.)

If not set then positions may contain commas. In this case the return value should point to the end of the string, or NULL on failure.

HTS PARSE ONE COORD

If present, X:100 is treated as the single base pair region X:100-100. In this case X:-100 is shorthand for X:1-100 and X:100- is X:100-<end>. (This is the standard bcftools region convention.)

When not set X:100 is considered to be X:100-<end> where <end> is the end of chromosome X (set to INT_MAX here). X:100- and X:-100 are invalid. (This is the standard samtools region convention.)

Note the supplied string expects 1 based inclusive coordinates, but the returned coordinates start from 0 and are half open, so pos0 is valid for use in e.g. "for (pos0 = beg; pos0 < end; pos0++) {...}"

If NULL is returned, the value in tid mat give additional information about the error:

-2 Failed to parse @p hdr; or out of memory -1 The reference in @p str has mismatched braces, or does not exist in @p hdr >= 0 The specified range in @p str could not be parsed

Meta