Packages

c

com.jnpersson.discount.fastdoop

IndexedFastaReader

class IndexedFastaReader extends RecordReader[Text, PartialSequence]

FASTA file reader that uses a faidx (.fai) file to track sequence locations. .fai indexes can be generated by various tools, for example seqkit: https://github.com/shenwei356/seqkit/

This reader can read a mix of full and partial sequences. If the sequence is fully contained in this split, it will be read as a single PartialSequence record. Otherwise, it will be read as multiple records. Partial sequences can be identified and reassembled using their header (corresponding to sequence ID) and seqPosition fields.

Partial sequences are read together with (k-1) bps from the next part to ensure that full k-mers can be processed.

The reader for every split must stream the FAI file. Thus, it is not recommended to use this reader for e.g. short reads, or when the maximum size of a sequence is relatively small. ShortReadsRecordReader and FASTQReadsRecordReader are better suited to such a task. For reading a single long sequence without a FAI index, LongReadsRecordReader can be used instead.

Version

1.0

See also

IndexedFastaFormat

Linear Supertypes
RecordReader[Text, PartialSequence], Closeable, AutoCloseable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. IndexedFastaReader
  2. RecordReader
  3. Closeable
  4. AutoCloseable
  5. AnyRef
  6. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new IndexedFastaReader()

Value Members

  1. def close(): Unit
    Definition Classes
    IndexedFastaReader → RecordReader → Closeable → AutoCloseable
  2. def getCurrentKey(): Text
    Definition Classes
    IndexedFastaReader → RecordReader
  3. def getCurrentValue(): PartialSequence
    Definition Classes
    IndexedFastaReader → RecordReader
  4. def getProgress(): Float
    Definition Classes
    IndexedFastaReader → RecordReader
  5. def initialize(genericSplit: InputSplit, context: TaskAttemptContext): Unit
    Definition Classes
    IndexedFastaReader → RecordReader
  6. def nextKeyValue(): Boolean
    Definition Classes
    IndexedFastaReader → RecordReader