class IndexedFastaReader extends RecordReader[Text, PartialSequence]
FASTA file reader that uses a faidx (.fai) file to track sequence locations. .fai indexes can be generated by various tools, for example seqkit: https://github.com/shenwei356/seqkit/
This reader can read a mix of full and partial sequences. If the sequence is fully contained in this split, it will be read as a single PartialSequence record. Otherwise, it will be read as multiple records. Partial sequences can be identified and reassembled using their header (corresponding to sequence ID) and seqPosition fields.
Partial sequences are read together with (k-1) bps from the next part to ensure that full k-mers can be processed.
The reader for every split must stream the FAI file. Thus, it is not recommended to use this reader for e.g. short reads, or when the maximum size of a sequence is relatively small. ShortReadsRecordReader and FASTQReadsRecordReader are better suited to such a task. For reading a single long sequence without a FAI index, LongReadsRecordReader can be used instead.
- Version
1.0
- See also
- Alphabetic
- By Inheritance
- IndexedFastaReader
- RecordReader
- Closeable
- AutoCloseable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new IndexedFastaReader()
Value Members
-
def
close(): Unit
- Definition Classes
- IndexedFastaReader → RecordReader → Closeable → AutoCloseable
-
def
getCurrentKey(): Text
- Definition Classes
- IndexedFastaReader → RecordReader
-
def
getCurrentValue(): PartialSequence
- Definition Classes
- IndexedFastaReader → RecordReader
-
def
getProgress(): Float
- Definition Classes
- IndexedFastaReader → RecordReader
-
def
initialize(genericSplit: InputSplit, context: TaskAttemptContext): Unit
- Definition Classes
- IndexedFastaReader → RecordReader
-
def
nextKeyValue(): Boolean
- Definition Classes
- IndexedFastaReader → RecordReader