Packages

package hash

Provides classes for hashing k-mers and nucleotide sequences. Hashing is done by identifying minimizers. Hashing all k-mers in a sequence thus corresponds to splitting the sequence into super-mers of length >= k (super k-mers) where all k-mers share the same minimizer.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. hash
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. type BucketId = Long

    Type of a compacted hash (minimizer)

  2. final case class InputFragment(header: SeqTitle, location: SeqLocation, nucleotides: NTSeq) extends Product with Serializable

    A sequence fragment with a controlled maximum size.

    A sequence fragment with a controlled maximum size. Does not contain whitespace.

    header

    Title/header of the sequence

    location

    1-based location in the source sequence

    nucleotides

    Nucleotides in the source sequence

  3. final case class MinSplitter[+P <: MinimizerPriorities](priorities: P, k: Int) extends Product with Serializable

    Split reads into superkmers by ranked motifs (minimizers).

    Split reads into superkmers by ranked motifs (minimizers). Such superkmers can be bucketed by the corresponding minimizer.

    priorities

    Minimizer ordering to use for splitting

    k

    k-mer length

  4. final case class MinTable(byPriority: ArrayBuffer[NTSeq], largeBuckets: ArrayBuffer[NTSeq] = ArrayBuffer.empty) extends MinimizerPriorities with Product with Serializable

    A table of minimizers with relative priorities (minimizer ordering).

    A table of minimizers with relative priorities (minimizer ordering).

    byPriority

    Minimizers ordered from high priority to low. The position in the array is the rank, and also the unique ID in this table, of the corresponding minimizer. All minimizers must be of equal length.

    largeBuckets

    A subset of byPriority, indicating the motifs that have been found to correspond to large buckets, if any.

  5. trait MinimizerPriorities extends Serializable

    Defines a reversible mapping between encoded minimizers and their priorities.

  6. final class PosRankWindow extends AnyRef

    Tracks Motifs in a moving window, such that the top priority item can always be obtained efficiently.

    Tracks Motifs in a moving window, such that the top priority item can always be obtained efficiently. Mutates the array. Can only be used once. This class looks like an Iterator[Int], but to avoid boxing of integers, does not extend that trait.

    Invariants: the leftmost position has the highest priority (minimal rank). Priority decreases (i.e. rank increases) monotonically going left to right. Motifs are sorted by position. The minimizer of the current k-length window is always the first motif in the list.

  7. final case class RandomXOR(width: Int, xorMask: Long, canonical: Boolean) extends MinimizerPriorities with Product with Serializable

    Compute minimizer priority by XORing with a random mask

  8. final case class SampledFrequencies(table: MinTable, counts: Array[(Long, Int)]) extends Product with Serializable

    Sampled motif frequencies that may be used to construct a new minimizer ordering.

    Sampled motif frequencies that may be used to construct a new minimizer ordering.

    table

    Template table, whose ordering of motifs will be refined based on counted frequencies.

    counts

    Pairs of (minimizer rank, frequency). The minimizers should be a subset of those from the given template MinTable.

  9. final case class ShiftScanner(priorities: MinimizerPriorities) extends Product with Serializable

    Bit-shift scanner for fixed width motifs.

    Bit-shift scanner for fixed width motifs. Identifies all valid (according to some MinimizerPriorities) motifs/minimizers in a sequence.

    priorities

    The minimizer ordering to scan for motifs of

  10. final case class SplitSegment(hash: BucketId, sequence: SeqID, location: SeqLocation, nucleotides: ZeroNTBitArray) extends Product with Serializable

    A hashed segment (i.e.

    A hashed segment (i.e. a superkmer, where every k-mer shares the same minimizer) with minimizer, sequence ID, and 1-based sequence location

    hash

    hash (minimizer)

    sequence

    Sequence ID/header

    location

    Sequence location (1-based) if available

    nucleotides

    Encoded nucleotides of this segment

Value Members

  1. object BundledMinimizers

    Object to manage minimizer files that are stored directly on the classpath (e.g.

    Object to manage minimizer files that are stored directly on the classpath (e.g. in the same jar)

  2. object MinSplitter extends Serializable
  3. object MinTable extends Serializable
  4. object Orderings

    Routines for creating minimizer orderings.

  5. object SampledFrequencies extends Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped