Packages

class Index extends AnyRef

A bucketed k-mer index. Indexes store super-mers in a Dataset of ReducibleBucket, where each k-mer is associated with a tag. Typically tags are k-mer counts, and then the Index becomes a multiset of counted k-mers. Indexes are immutable, like other Spark datastructures, and operations like filtering return a new Index rather than change the existing one in place. Indexes can be combined using operations like union, intersect, and subtract, and can be written to disk in various formats. The default format used by the write() and read() methods is bucketed parquet files, which gives good data compression and avoids shuffling when the same Index is used repeatedly.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Index
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new Index(params: IndexParams, buckets: Dataset[ReducibleBucket])(implicit spark: SparkSession)

    params

    Index parameters, which define the minimizer scheme, the lengths of k and m, and the number of buckets. Two indexes must have compatible parameters to be combined.

    buckets

    K-mer buckets. Buckets contain super-mers and tags for each k-mer. Tags can be, for example, k-mer counts.

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. def add(other: Index): Index

    Convenience method for union using Rule.Sum

  5. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  6. def bcSplit: Broadcast[AnyMinSplitter]
  7. val buckets: Dataset[ReducibleBucket]
  8. def cache(): Index.this.type

    Cache this index by caching the underlying dataset.

    Cache this index by caching the underlying dataset. This will persist it in memory and on disk (if needed), which means that it does not have to be recomputed again if used repeatedly. See Dataset.cache.

  9. def changeMinimizerOrdering(spl: Broadcast[AnyMinSplitter]): Index

    Split the super-mers according to a new minimizer ordering, generating an index with the same k-mers that respects the new ordering.

  10. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  11. def counted(normalize: Boolean = false): CountedKmers

    Obtain counts for these k-mers.

    Obtain counts for these k-mers.

    normalize

    Whether to filter k-mers by orientation

  12. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  13. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  14. def filterCounts(min: Option[Int] = None, max: Option[Int] = None): Index

    Filter counts in this index based on lower and/or upper bound

  15. def filterCounts(min: Int, max: Int): Index
  16. def filterMax(max: Int): Index

    Convenience method to filter counts by maximum

  17. def filterMin(min: Int): Index

    Convenience method to filter counts by minimum

  18. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  19. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  20. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  21. def histogram: Dataset[(Tag, Long)]

    Obtain these counts as a histogram.

    Obtain these counts as a histogram.

    returns

    Pairs of abundances and their frequencies in the dataset.

  22. def intersect(other: Index, rule: Rule): Index

    Intersect this index with another one, combining the k-mers using the given reducer type.

    Intersect this index with another one, combining the k-mers using the given reducer type. A k-mer is kept after an intersection operation if it is present in both of the input indexes, and passes any other rules that the reducer implements.

  23. def intersectLeft(other: Index): Index

    Convenience method for intersection using Rule.Left

  24. def intersectMany(ixs: Iterable[Index], rule: Rule): Index

    Intersect this index with a series of indexes using the given reducer type.

  25. def intersectMax(other: Index): Index

    Convenience method for intersection using Rule.Max

  26. def intersectMin(other: Index): Index

    Convenience method for intersection using Rule.Min

  27. def intersectRight(other: Index): Index

    Convenience method for intersection using Rule.Right

  28. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  29. def lookup(query: Index): Index

    Look up the given k-mers in this index, if they exist.

    Look up the given k-mers in this index, if they exist. Convenience method. This is equivalent to intersect(query, Rule.Left).

  30. def lookup(sequences: Seq[String]): Index

    Look up the given NT sequences (strings) in this index, if they exist.

    Look up the given NT sequences (strings) in this index, if they exist. Convenience method. This is equivalent to intersect(Index.fromNTSeqs(sequences, params), Rule.Left). This method is not intended for large amounts of data, as everything has to go through the Spark driver.

  31. def mapTags(f: (Tag) ⇒ Tag): Index

    Transform the tags of this index, returning a copy with the changes applied

  32. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  33. def newCompatible(discount: Discount, inFiles: String*): Index

    Construct a compatible index (suitable for operations like intersection and union) from the given sequence files.

    Construct a compatible index (suitable for operations like intersection and union) from the given sequence files. Settings will be copied from this index. The count method in the Discount object (pregrouped/simple) will be used, defaulting to Simple if none was specified. The minimizer scheme (splitter) used for this index will be reused.

    discount

    Source of settings such as count method and input format. k must be the same as in this index.

    inFiles

    Input files (fasta/fastq etc)

  34. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  35. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  36. val params: IndexParams
  37. def repartition(partitions: Int): Index

    Repartition this index into a different number of partitions (and buckets when written to disk as parquet)

  38. def sample(fraction: Double): Index

    Sample k-mers from this index.

    Sample k-mers from this index. Sampling is done on the level of distinct k-mers. K-mers will either be included with the same count as before, or omitted.

  39. def showStats(outputLocation: Option[String] = None): Unit

    Show summary stats for this index.

    Show summary stats for this index. This action triggers a computation.

  40. def stats(min: Option[Int] = None, max: Option[Int] = None): Dataset[BucketStats]

    Obtain per-bucket (bin) statistics.

  41. def subtract(other: Index, rule: Rule): Index

    Subtract another index from this one, using e.g.

    Subtract another index from this one, using e.g. Rule.KmersSubtract or Rule.CountersSubtract. Subtraction is implemented as a union, but this is not a commutative operation due to how the rules are implemented.

  42. def subtractCounts(other: Index): Index

    Convenience method for subtraction using Rule.CountersSubtract

  43. def subtractKmers(other: Index): Index

    Convenience method for subtraction using Rule.KmersSubtract

  44. def subtractMany(ixs: Iterable[Index], rule: Rule): Index

    Subtract a series of indexes B1, B2...

    Subtract a series of indexes B1, B2... Bn from this index (A): ((A - B1) - B2) - ... using Rule.KmersSubtract or Rule.CountersSubtract.

  45. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  46. def toString(): String
    Definition Classes
    AnyRef → Any
  47. def union(other: Index, rule: Rule): Index

    Union this index with another one, combining the k-mers using the given reducer type.

    Union this index with another one, combining the k-mers using the given reducer type. A k-mer is kept after a union operation if it is present in either of the input indexes, and passes any other rules that the reducer implements.

  48. def unionLeft(other: Index): Index

    Convenience method for union using Rule.Left

  49. def unionMany(ixs: Iterable[Index], rule: Rule): Index

    Union this index with a series of indexes using the given reducer type.

  50. def unionMax(other: Index): Index

    Convenience method for union using Rule.Max

  51. def unionMin(other: Index): Index

    Convenience method for union using Rule.Min

  52. def unionRight(other: Index): Index

    Convenience method for union using Rule.Right

  53. def unpersist(): Unit

    Unpersist this index, undoing the effect of caching.

    Unpersist this index, undoing the effect of caching. See Dataset.unpersist.

  54. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  55. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  56. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  57. def write(location: String)(implicit spark: SparkSession): Unit

    Write this index to a location.

    Write this index to a location. This action triggers a computation.

  58. def writeBucketStats(location: String): Unit

    Write per-bucket statistics to HDFS.

    Write per-bucket statistics to HDFS. This action triggers a computation.

    location

    Directory (prefix name) to write data to

  59. def writeHistogram(output: String): Unit

    Write the histogram of this data to HDFS.

    Write the histogram of this data to HDFS. This action triggers a computation.

    output

    Directory to write to (prefix name)

Inherited from AnyRef

Inherited from Any

Ungrouped