Packages

final case class Discount(k: Int, minimizers: MinimizerSource = Bundled, m: Int = 10, ordering: MinimizerOrdering = Frequency, sample: Double = 0.01, maxSequenceLength: Int = 1000000, normalize: Boolean = false, method: CountMethod = Auto, partitions: Int = 200)(implicit spark: SparkSession) extends Product with Serializable

Main API entry point for Discount. Also see the command line examples in the documentation for more information on these options.

k

k-mer length

minimizers

source of minimizers. See MinimizerSource

m

minimizer width

ordering

minimizer ordering. See MinimizerOrdering

sample

sample fraction for frequency orderings

maxSequenceLength

max length of a single sequence (for short reads)

normalize

whether to normalize k-mer orientation during counting. Causes every sequence to be scanned in both forward and reverse, after which only forward orientation k-mers are kept.

method

counting method to use (or None for automatic selection). See CountMethod

partitions

number of shuffle partitions/index buckets

spark

the SparkSession

Linear Supertypes
Serializable, Serializable, Product, Equals, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Discount
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new Discount(k: Int, minimizers: MinimizerSource = Bundled, m: Int = 10, ordering: MinimizerOrdering = Frequency, sample: Double = 0.01, maxSequenceLength: Int = 1000000, normalize: Boolean = false, method: CountMethod = Auto, partitions: Int = 200)(implicit spark: SparkSession)

    k

    k-mer length

    minimizers

    source of minimizers. See MinimizerSource

    m

    minimizer width

    ordering

    minimizer ordering. See MinimizerOrdering

    sample

    sample fraction for frequency orderings

    maxSequenceLength

    max length of a single sequence (for short reads)

    normalize

    whether to normalize k-mer orientation during counting. Causes every sequence to be scanned in both forward and reverse, after which only forward orientation k-mers are kept.

    method

    counting method to use (or None for automatic selection). See CountMethod

    partitions

    number of shuffle partitions/index buckets

    spark

    the SparkSession

Value Members

  1. def emptyIndex(inFiles: String*): Index

    Construct an empty index, using the supplied sequence files to prepare the minimizer ordering.

    Construct an empty index, using the supplied sequence files to prepare the minimizer ordering. This is useful when a frequency ordering is used and one wants to sample a large number of files in advance. Index.newCompatible or index(compatible: Index, inFiles: String*) can then be used to construct compatible indexes with actual k-mers using the resulting ordering.

    inFiles

    The input files to sample for frequency orderings

  2. def getInputFragments(file: String, addRCReads: Boolean = false): Dataset[InputFragment]

    Single file version of the same method

  3. def getInputFragments(files: Seq[String], addRCReads: Boolean): Dataset[InputFragment]

    Load input fragments (with sequence title and location) according to the settings in this object.

    Load input fragments (with sequence title and location) according to the settings in this object.

    files

    input files

    addRCReads

    whether to add reverse complements

  4. def getInputSequences(file: String, addRCReads: Boolean = false): Dataset[NTSeq]

    Single file version of the same method

  5. def getInputSequences(files: Seq[String], addRCReads: Boolean): Dataset[NTSeq]

    Load reads/sequences from files according to the settings in this object.

    Load reads/sequences from files according to the settings in this object.

    files

    input files

    addRCReads

    whether to add reverse complements

  6. def getSplitter(inFiles: Option[Seq[String]], persistHash: Option[String] = None): MinSplitter[_ <: MinimizerPriorities]

    Construct a read splitter for the given input files based on the settings in this object.

    Construct a read splitter for the given input files based on the settings in this object.

    inFiles

    Input files (for frequency orderings, which require sampling)

    persistHash

    Location to persist the generated minimizer ordering (for frequency orderings), if any

    returns

    a MinSplitter configured with a minimizer ordering and corresponding MinTable

  7. def index(compatible: Index, inFiles: String*): Index

    Convenience method to construct a compatible counting k-mer index containing all k-mers from the input sequence files.

    Convenience method to construct a compatible counting k-mer index containing all k-mers from the input sequence files.

    compatible

    Compatible index to copy settings, such as an existing minimizer ordering, from

    inFiles

    input files

  8. def index(inFiles: String*): Index

    Convenience method to construct a counting k-mer index containing all k-mers from the input sequence files.

    Convenience method to construct a counting k-mer index containing all k-mers from the input sequence files. If a frequency minimizer ordering is used (which is the default), the input files will be sampled and a new minimizer ordering will be constructed.

    inFiles

    input files

  9. def inputReader(files: String*): Inputs

    Obtain an InputReader configured with settings from this object.

    Obtain an InputReader configured with settings from this object.

    files

    Files to read. Can be a single file or multiple files. Wildcards can be used. A name of the format @list.txt will be parsed as a list of files.

  10. val k: Int
  11. def kmers(knownSplitter: Broadcast[AnyMinSplitter], inFiles: String*): Kmers

    Load k-mers from the given files.

  12. def kmers(inFiles: String*): Kmers

    Load k-mers from the given files.

  13. val m: Int
  14. val maxSequenceLength: Int
  15. val method: CountMethod
  16. val minimizers: MinimizerSource
  17. val normalize: Boolean
  18. val ordering: MinimizerOrdering
  19. val partitions: Int
  20. val sample: Double
  21. def sequenceTitles(input: String*): Dataset[SeqTitle]

    Load sequence titles only from the given input files