final case class Discount(k: Int, minimizers: MinimizerSource = Bundled, m: Int = 10, ordering: MinimizerOrdering = Frequency, sample: Double = 0.01, maxSequenceLength: Int = 1000000, normalize: Boolean = false, method: CountMethod = Auto, partitions: Int = 200)(implicit spark: SparkSession) extends Product with Serializable
Main API entry point for Discount. Also see the command line examples in the documentation for more information on these options.
- k
k-mer length
- minimizers
source of minimizers. See MinimizerSource
- m
minimizer width
- ordering
minimizer ordering. See MinimizerOrdering
- sample
sample fraction for frequency orderings
- maxSequenceLength
max length of a single sequence (for short reads)
- normalize
whether to normalize k-mer orientation during counting. Causes every sequence to be scanned in both forward and reverse, after which only forward orientation k-mers are kept.
- method
counting method to use (or None for automatic selection). See CountMethod
- partitions
number of shuffle partitions/index buckets
- spark
the SparkSession
- Alphabetic
- By Inheritance
- Discount
- Serializable
- Serializable
- Product
- Equals
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
Discount(k: Int, minimizers: MinimizerSource = Bundled, m: Int = 10, ordering: MinimizerOrdering = Frequency, sample: Double = 0.01, maxSequenceLength: Int = 1000000, normalize: Boolean = false, method: CountMethod = Auto, partitions: Int = 200)(implicit spark: SparkSession)
- k
k-mer length
- minimizers
source of minimizers. See MinimizerSource
- m
minimizer width
- ordering
minimizer ordering. See MinimizerOrdering
- sample
sample fraction for frequency orderings
- maxSequenceLength
max length of a single sequence (for short reads)
- normalize
whether to normalize k-mer orientation during counting. Causes every sequence to be scanned in both forward and reverse, after which only forward orientation k-mers are kept.
- method
counting method to use (or None for automatic selection). See CountMethod
- partitions
number of shuffle partitions/index buckets
- spark
the SparkSession
Value Members
-
def
emptyIndex(inFiles: String*): Index
Construct an empty index, using the supplied sequence files to prepare the minimizer ordering.
Construct an empty index, using the supplied sequence files to prepare the minimizer ordering. This is useful when a frequency ordering is used and one wants to sample a large number of files in advance. Index.newCompatible or index(compatible: Index, inFiles: String*) can then be used to construct compatible indexes with actual k-mers using the resulting ordering.
- inFiles
The input files to sample for frequency orderings
-
def
getInputFragments(file: String, addRCReads: Boolean = false): Dataset[InputFragment]
Single file version of the same method
-
def
getInputFragments(files: Seq[String], addRCReads: Boolean): Dataset[InputFragment]
Load input fragments (with sequence title and location) according to the settings in this object.
Load input fragments (with sequence title and location) according to the settings in this object.
- files
input files
- addRCReads
whether to add reverse complements
-
def
getInputSequences(file: String, addRCReads: Boolean = false): Dataset[NTSeq]
Single file version of the same method
-
def
getInputSequences(files: Seq[String], addRCReads: Boolean): Dataset[NTSeq]
Load reads/sequences from files according to the settings in this object.
Load reads/sequences from files according to the settings in this object.
- files
input files
- addRCReads
whether to add reverse complements
-
def
getSplitter(inFiles: Option[Seq[String]], persistHash: Option[String] = None): MinSplitter[_ <: MinimizerPriorities]
Construct a read splitter for the given input files based on the settings in this object.
Construct a read splitter for the given input files based on the settings in this object.
- inFiles
Input files (for frequency orderings, which require sampling)
- persistHash
Location to persist the generated minimizer ordering (for frequency orderings), if any
- returns
a MinSplitter configured with a minimizer ordering and corresponding MinTable
-
def
index(compatible: Index, inFiles: String*): Index
Convenience method to construct a compatible counting k-mer index containing all k-mers from the input sequence files.
Convenience method to construct a compatible counting k-mer index containing all k-mers from the input sequence files.
- compatible
Compatible index to copy settings, such as an existing minimizer ordering, from
- inFiles
input files
-
def
index(inFiles: String*): Index
Convenience method to construct a counting k-mer index containing all k-mers from the input sequence files.
Convenience method to construct a counting k-mer index containing all k-mers from the input sequence files. If a frequency minimizer ordering is used (which is the default), the input files will be sampled and a new minimizer ordering will be constructed.
- inFiles
input files
-
def
inputReader(files: String*): Inputs
Obtain an InputReader configured with settings from this object.
Obtain an InputReader configured with settings from this object.
- files
Files to read. Can be a single file or multiple files. Wildcards can be used. A name of the format @list.txt will be parsed as a list of files.
- val k: Int
-
def
kmers(knownSplitter: Broadcast[AnyMinSplitter], inFiles: String*): Kmers
Load k-mers from the given files.
-
def
kmers(inFiles: String*): Kmers
Load k-mers from the given files.
- val m: Int
- val maxSequenceLength: Int
- val method: CountMethod
- val minimizers: MinimizerSource
- val normalize: Boolean
- val ordering: MinimizerOrdering
- val partitions: Int
- val sample: Double
-
def
sequenceTitles(input: String*): Dataset[SeqTitle]
Load sequence titles only from the given input files