Packages

package fastdoop

Type Members

  1. case class FAIRecord(id: String, length: Long, start: Long, bpsPerLine: Int, bytesPerLine: Int) extends Product with Serializable

    FAI (fasta index) record.

    FAI (fasta index) record.

    Example entries in a FAI file:

    ENA|LR865458|LR865458.1 590561804 75 60 61 ENA|LR865459|LR865459.1 685720839 600404651 60 61 ENA|LR865460|LR865460.1 490910922 1297554246 60 61

    id

    Sequence ID

    length

    length in bps

    start

    start position (byte offset in file)

    bpsPerLine

    bps per line

    bytesPerLine

    bytes per line

  2. class FASTAlongInputFileFormat extends FileInputFormat[Text, PartialSequence]

    A FileInputFormat for reading FASTA files containing sequences of arbitrary length.

    A FileInputFormat for reading FASTA files containing sequences of arbitrary length.

    Version

    1.0

    See also

    FileInputFormat

  3. class FASTAshortInputFileFormat extends FileInputFormat[Text, Record]

    A FileInputFormat for reading FASTA files containing short sequences.

    A FileInputFormat for reading FASTA files containing short sequences.

    Version

    1.0

    See also

    FileInputFormat

  4. class FASTQInputFileFormat extends FileInputFormat[Text, QRecord]

    A FileInputFormat for reading FASTQ files.

    A FileInputFormat for reading FASTQ files.

    Version

    1.0

    See also

    FileInputFormat

  5. class FASTQReadsRecordReader extends RecordReader[Text, QRecord]

    This class reads <key, value> pairs from an InputSplit.

    This class reads <key, value> pairs from an InputSplit. The input file is in FASTQ format. A FASTA record has a header line that is the key, the data line, an optional single line header string and a quality line.

    Example: @SRR034939.184 090406_HWI-EAS68_9096_FC400PR_PE_1_1_10 length=100 CCACCTCCTGGGTTCAAGGGGTTCTCTTGCCTCAGCTNNNNNNNNNNNNGGNNNNNNNNNTNNNN +SRR034939.184 090406_HWI-EAS68_9096_FC400PR_PE_1_1_10 length=100 HDHFHHHHHHFFAFF6?<:<HHHHHHHHHEDHHHF##!!!!!!!!!!!!##!!!!!!!!!#!!!! ...

    Version

    1.0

    See also

    InputSplit

  6. class IndexedFastaFormat extends FileInputFormat[Text, PartialSequence]

    Hadoop input format for FASTA files with an accompanying .fai index file.

    Hadoop input format for FASTA files with an accompanying .fai index file.

    Version

    1.0

    See also

    IndexedFastaReader

  7. class IndexedFastaReader extends RecordReader[Text, PartialSequence]

    FASTA file reader that uses a faidx (.fai) file to track sequence locations.

    FASTA file reader that uses a faidx (.fai) file to track sequence locations. .fai indexes can be generated by various tools, for example seqkit: https://github.com/shenwei356/seqkit/

    This reader can read a mix of full and partial sequences. If the sequence is fully contained in this split, it will be read as a single PartialSequence record. Otherwise, it will be read as multiple records. Partial sequences can be identified and reassembled using their header (corresponding to sequence ID) and seqPosition fields.

    Partial sequences are read together with (k-1) bps from the next part to ensure that full k-mers can be processed.

    The reader for every split must stream the FAI file. Thus, it is not recommended to use this reader for e.g. short reads, or when the maximum size of a sequence is relatively small. ShortReadsRecordReader and FASTQReadsRecordReader are better suited to such a task. For reading a single long sequence without a FAI index, LongReadsRecordReader can be used instead.

    Version

    1.0

    See also

    IndexedFastaFormat

  8. class LongReadsRecordReader extends RecordReader[Text, PartialSequence]

    This class reads <key, value> pairs from an InputSplit.

    This class reads <key, value> pairs from an InputSplit. The input file is in FASTA format and contains a single long sequence. A FASTA record has a header line that is the key, and data lines that are the value. >header... data ...

    Example: >Seq1 TAATCCCAAATGATTATATCCTTCTCCGATCGCTAGCTATACCTTCCAGGCGATGAACTTAGACGGAATCCACTTTGCTA CAACGCGATGACTCAACCGCCATGGTGGTACTAGTCGCGGAAAAGAAAGAGTAAACGCCAACGGGCTAGACACACTAATC CTCCGTCCCCAACAGGTATGATACCGTTGGCTTCACTTCTACTACATTCGTAATCTCTTTGTCAGTCCTCCCGTACGTTG GCAAAGGTTCACTGGAAAAATTGCCGACGCACAGGTGCCGGGCCGTGAATAGGGCCAGATGAACAAGGAAATAATCACCA CCGAGGTGTGACATGCCCTCTCGGGCAACCACTCTTCCTCATACCCCCTCTGGGCTAACTCGGAGCAAAGAACTTGGTAA ...

    Version

    1.0

    See also

    InputSplit

  9. class PartialSequence extends Serializable

    This class is used to store fragments of a long input FASTA sequence as an array of bytes.

    This class is used to store fragments of a long input FASTA sequence as an array of bytes.

    Version

    1.0

  10. class QRecord extends Serializable

    Utility class used to represent as a record a sequence existing in a FASTQ file.

    Utility class used to represent as a record a sequence existing in a FASTQ file.

    Version

    1.0

  11. class Record extends Serializable

    Utility class used to represent as a record a sequence existing in a FASTA file.

    Utility class used to represent as a record a sequence existing in a FASTA file.

    Version

    1.0

  12. class ShortReadsRecordReader extends RecordReader[Text, Record]

    This class reads <key, value> pairs from an InputSplit.

    This class reads <key, value> pairs from an InputSplit. The input file is in FASTA format. A FASTA record has a header line that is the key, and data lines that are the value. >header... data ...

    Example: >Seq1 TAATCCCAAATGATTATATCCTTCTCCGATCGCTAGCTATACCTTCCAGGCGATGAACTTAGACGGAATCCACTTTGCTA CAACGCGATGACTCAACCGCCATGGTGGTACTAGTCGCGGAAAAGAAAGAGTAAACGCCAACGGGCTAGACACACTAATC CTCCGTCCCCAACAGGTATGATACCGTTGGCTTCACTTCTA >Seq2 CTACATTCGTAATCTCTTTGTCAGTCCTCCCGTACGTTGGCAAAGGTTCACTGGAAAAATTGCCGACGCACAGGTGCCGG GCCGTGAATAGGGCCAGATGAACAAGGAAATAATCACCACCGAGGTGTGACATGCCCTCTCGGGCAACCACTCTTCCTCA TACCCCCTCTGGGCTAACTCGGAGCAAAGAACTTGGTAA ...

    Version

    1.0

    See also

    InputSplit

Ungrouped