package hash
Provides classes for hashing k-mers and nucleotide sequences. Hashing is done by identifying minimizers. Hashing all k-mers in a sequence thus corresponds to splitting the sequence into super-mers of length >= k (super k-mers) where all k-mers share the same minimizer.
- Alphabetic
- By Inheritance
- hash
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
-
type
BucketId = Long
Type of a compacted hash (minimizer)
-
final
case class
InputFragment(header: SeqTitle, location: SeqLocation, nucleotides: NTSeq) extends Product with Serializable
A sequence fragment with a controlled maximum size.
A sequence fragment with a controlled maximum size. Does not contain whitespace.
- header
Title/header of the sequence
- location
1-based location in the source sequence
- nucleotides
Nucleotides in the source sequence
-
final
case class
MinSplitter[+P <: MinimizerPriorities](priorities: P, k: Int) extends Product with Serializable
Split reads into superkmers by ranked motifs (minimizers).
Split reads into superkmers by ranked motifs (minimizers). Such superkmers can be bucketed by the corresponding minimizer.
- priorities
Minimizer ordering to use for splitting
- k
k-mer length
-
final
case class
MinTable(byPriority: ArrayBuffer[NTSeq], largeBuckets: ArrayBuffer[NTSeq] = ArrayBuffer.empty) extends MinimizerPriorities with Product with Serializable
A table of minimizers with relative priorities (minimizer ordering).
A table of minimizers with relative priorities (minimizer ordering).
- byPriority
Minimizers ordered from high priority to low. The position in the array is the rank, and also the unique ID in this table, of the corresponding minimizer. All minimizers must be of equal length.
- largeBuckets
A subset of byPriority, indicating the motifs that have been found to correspond to large buckets, if any.
-
trait
MinimizerPriorities extends Serializable
Defines a reversible mapping between encoded minimizers and their priorities.
-
final
class
PosRankWindow extends AnyRef
Tracks Motifs in a moving window, such that the top priority item can always be obtained efficiently.
Tracks Motifs in a moving window, such that the top priority item can always be obtained efficiently. Mutates the array. Can only be used once. This class looks like an Iterator[Int], but to avoid boxing of integers, does not extend that trait.
Invariants: the leftmost position has the highest priority (minimal rank). Priority decreases (i.e. rank increases) monotonically going left to right. Motifs are sorted by position. The minimizer of the current k-length window is always the first motif in the list.
-
final
case class
RandomXOR(width: Int, xorMask: Long, canonical: Boolean) extends MinimizerPriorities with Product with Serializable
Compute minimizer priority by XORing with a random mask
-
final
case class
SampledFrequencies(table: MinTable, counts: Array[(Long, Int)]) extends Product with Serializable
Sampled motif frequencies that may be used to construct a new minimizer ordering.
Sampled motif frequencies that may be used to construct a new minimizer ordering.
- table
Template table, whose ordering of motifs will be refined based on counted frequencies.
- counts
Pairs of (minimizer rank, frequency). The minimizers should be a subset of those from the given template MinTable.
-
final
case class
ShiftScanner(priorities: MinimizerPriorities) extends Product with Serializable
Bit-shift scanner for fixed width motifs.
Bit-shift scanner for fixed width motifs. Identifies all valid (according to some MinimizerPriorities) motifs/minimizers in a sequence.
- priorities
The minimizer ordering to scan for motifs of
-
final
case class
SplitSegment(hash: BucketId, sequence: SeqID, location: SeqLocation, nucleotides: ZeroNTBitArray) extends Product with Serializable
A hashed segment (i.e.
A hashed segment (i.e. a superkmer, where every k-mer shares the same minimizer) with minimizer, sequence ID, and 1-based sequence location
- hash
hash (minimizer)
- sequence
Sequence ID/header
- location
Sequence location (1-based) if available
- nucleotides
Encoded nucleotides of this segment
Value Members
-
object
BundledMinimizers
Object to manage minimizer files that are stored directly on the classpath (e.g.
Object to manage minimizer files that are stored directly on the classpath (e.g. in the same jar)
- object MinSplitter extends Serializable
- object MinTable extends Serializable
-
object
Orderings
Routines for creating minimizer orderings.
- object SampledFrequencies extends Serializable