IndexedFastaFile

FASTA file with .fai or .gzi index

Reads existing FASTA file, optionally creating FASTA index if one does not exist.

Convenient member fns to get no. of sequences, get sequence names and lengths, test for membership, and rapidly fetch sequence at offset.

Constructors

this
this(string fn, bool create)

construct from filename, optionally creating index if it does not exist throws Exception (TODO: remove) if file DNE, or if index DNE unless create->true

Members

Functions

fetchSequence
string fetchSequence(string contig, Interval!cs coords)

Fetch sequencing in a region by function call with contig, start, end string sequence = fafile.fetchSequence("chr2", 20123, 30456)

hasSeq
bool hasSeq(const(char)[] seqname)

Test whether the FASTA file/index contains string seqname

opDollar
OffsetType opDollar()

Array-end $ indexing hack courtesy of Steve Schveighoffer https://forum.dlang.org/post/rl7a56$nad$1@digitalmars.com

opIndex
auto opIndex(string ctg, Tuple!(Coordinate!bs, OffsetType) coords)

opIndex coordinate and Offset i.e fai["chrom1", ZB(1) .. $]

opIndex
auto opIndex(string ctg, Tuple!(OffsetType, OffsetType) coords)

opIndex two Offsets i.e fai["chrom1", $-2 .. $]

opIndex
auto opIndex(string ctg, OffsetType endoff)

opIndex one offset i.e fai["chrom1", $-1]

opIndex
auto opIndex(string region)

Fetch sequence in region by assoc array-style lookup: Uses htslib string region parsing string sequence = fafile["chr2:20123-30456"]

opIndex
auto opIndex(string contig, Interval!cs coords)

Fetch sequence in region by multidimensional slicing: string sequence = fafile["chr2", 20123 .. 30456]

opSlice
auto opSlice(Coordinate!bs start, OffsetType off)

opSlice as Coordinate and an offset i.e [ZB(2) .. $]

opSlice
auto opSlice(OffsetType start, OffsetType end)

opSlice as two offset i.e [$-2 .. $]

opSlice
auto opSlice(Coordinate!bs start, Coordinate!bs end)

Fetch sequence in region by multidimensional slicing: string sequence = fafile["chr2", 20123 .. 30456]

seqLen
int seqLen(const(char)[] seqname)

Return sequence length, -1 if not present NOTE: there is no 64 bit equivalent of this function (yet) in htslib-1.10

seqName
string seqName(int i)

Return the name of the i'th sequence

setCacheSize
void setCacheSize(int size)

Enable BGZF cacheing (size: bytes)

setThreads
deprecated void setThreads(int nthreads)

Enable multithreaded decompression of this FA/FQ Reading fn body of bgzf_mt, this actually ADDS threads (rather than setting) but we'll retain name for consistency with setCacheSize NB: IN A REAL-WORLD TEST (swiftover) CALLING setThreads(1) doubled runtime(???)

Properties

nSeq
auto nSeq [@property getter]

Return the number of sequences in the FASTA file/index

Meta