Query a region and return matching alignments as InputRange
Query on (chr, start, end) may take several forms:
1. query(region) with a string-based "region" form (e.g. chr1:1000-2000)
- Variant: pass an array of query region strings: query([reg1, reg, ...])
2. query(chr, start, end) with a combination of function parameters for
contig, start, and end (where contig may be either a string or the numeric
tid from BAM header; it would be uncommon to use this directly)
NOTE THAT THERE IS AN OFF-BY-ONE DIFFERENCE IN THE TWO METHODS ABOVE!
Region string based coordinates assume the first base of the reference
is 1 (e.g., chrX:1-100 yields the first 100 bases), whereas with the
integer function parameter versions, the coordinates are zero-based, half-open
(e.g., <chrX, 0, 100> yields the first 100 bases).
We also support array indexing on object of type SAMReader directly
in one of two above styles:
1. bamfile[region-string]
2. bamfile[contig, start .. end] with contig like no. 2 above
The D convention $ operator marking length of array is supported.
Finally, the region string is parsed by underlying htslib's hts_parse_region
and has special semantics available:
region | Outputs
REF: | All reads with RNAME REF
REF:START | Reads with RNAME REF overlapping START to end of REF
REF:-END | Reads with RNAME REF overlapping start of REF to END
REF:START-END | Reads with RNAME REF overlapping START to END
. | All reads from the start of the file
* | Unmapped reads at the end of the file (RNAME '*' in SAM)
Examples:
bamfile = SAMReader("whatever.bam");
auto reads1 = bamfile.query("chr1:1-500");
auto reads2 = bamfile.query("chr2", 0, 500);
auto reads3 = bamfile["chr3", 0 .. 500];
auto reads4 = bamfile["chrX", $-500 .. $]; // last 500 nt
auto reads5 = bamfile.query("chrY"); // entirety of chrY
// When colon present in reference name (e.g. HLA additions in GRCh38)
// wrap the ref name in { } (this is an htslib convention; see hts_parse_region)
auto reads6 = bamfile.query("{HLA-DRB1*12:17}:1-100");
Query a region and return matching alignments as InputRange
Query on (chr, start, end) may take several forms:
1. query(region) with a string-based "region" form (e.g. chr1:1000-2000) - Variant: pass an array of query region strings: query([reg1, reg, ...]) 2. query(chr, start, end) with a combination of function parameters for contig, start, and end (where contig may be either a string or the numeric tid from BAM header; it would be uncommon to use this directly)
NOTE THAT THERE IS AN OFF-BY-ONE DIFFERENCE IN THE TWO METHODS ABOVE! Region string based coordinates assume the first base of the reference is 1 (e.g., chrX:1-100 yields the first 100 bases), whereas with the integer function parameter versions, the coordinates are zero-based, half-open (e.g., <chrX, 0, 100> yields the first 100 bases).
We also support array indexing on object of type SAMReader directly in one of two above styles: 1. bamfile[region-string] 2. bamfile[contig, start .. end] with contig like no. 2 above
The D convention $ operator marking length of array is supported.
Finally, the region string is parsed by underlying htslib's hts_parse_region and has special semantics available:
region | Outputs
REF: | All reads with RNAME REF REF:START | Reads with RNAME REF overlapping START to end of REF REF:-END | Reads with RNAME REF overlapping start of REF to END REF:START-END | Reads with RNAME REF overlapping START to END . | All reads from the start of the file * | Unmapped reads at the end of the file (RNAME '*' in SAM)
Examples: