Index

Companion object Index

class Index extends AnyRef

A bucketed k-mer index. Indexes store super-mers in a Dataset of ReducibleBucket, where each k-mer is associated with a tag. Typically tags are k-mer counts, and then the Index becomes a multiset of counted k-mers. Indexes are immutable, like other Spark datastructures, and operations like filtering return a new Index rather than change the existing one in place. Indexes can be combined using operations like union, intersect, and subtract, and can be written to disk in various formats. The default format used by the write() and read() methods is bucketed parquet files, which gives good data compression and avoids shuffling when the same Index is used repeatedly.

Linear Supertypes

AnyRef, Any

Ordering

Alphabetic
By Inheritance

Inherited

Index
AnyRef
Any

Hide All
Show All

Visibility

Public
All

Instance Constructors

new Index(params: IndexParams, buckets: Dataset[ReducibleBucket])(implicit spark: SparkSession)
params
Index parameters, which define the minimizer scheme, the lengths of k and m, and the number of buckets. Two indexes must have compatible parameters to be combined.
buckets
K-mer buckets. Buckets contain super-mers and tags for each k-mer. Tags can be, for example, k-mer counts.

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def add(other: Index): Index
Convenience method for union using Rule.Sum
final def asInstanceOf[T0]: T0

Definition Classes
Any
def bcSplit: Broadcast[AnyMinSplitter]
val buckets: Dataset[ReducibleBucket]
def cache(): Index.this.type
Cache this index by caching the underlying dataset.
Cache this index by caching the underlying dataset. This will persist it in memory and on disk (if needed), which means that it does not have to be recomputed again if used repeatedly. See Dataset.cache.
def changeMinimizerOrdering(spl: Broadcast[AnyMinSplitter]): Index
Split the super-mers according to a new minimizer ordering, generating an index with the same k-mers that respects the new ordering.
def clone(): AnyRef

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
def counted(normalize: Boolean = false): CountedKmers
Obtain counts for these k-mers.
Obtain counts for these k-mers.
normalize
Whether to filter k-mers by orientation
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def filterCounts(min: Option[Int] = None, max: Option[Int] = None): Index
Filter counts in this index based on lower and/or upper bound
def filterCounts(min: Int, max: Int): Index
def filterMax(max: Int): Index
Convenience method to filter counts by maximum
def filterMin(min: Int): Index
Convenience method to filter counts by minimum
def finalize(): Unit

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
Annotations
@native()
def hashCode(): Int

Definition Classes
AnyRef → Any
Annotations
@native()
def histogram: Dataset[(Tag, Long)]
Obtain these counts as a histogram.
Obtain these counts as a histogram.
returns
Pairs of abundances and their frequencies in the dataset.
def intersect(other: Index, rule: Rule): Index
Intersect this index with another one, combining the k-mers using the given reducer type.
Intersect this index with another one, combining the k-mers using the given reducer type. A k-mer is kept after an intersection operation if it is present in both of the input indexes, and passes any other rules that the reducer implements.
def intersectLeft(other: Index): Index
Convenience method for intersection using Rule.Left
def intersectMany(ixs: Iterable[Index], rule: Rule): Index
Intersect this index with a series of indexes using the given reducer type.
def intersectMax(other: Index): Index
Convenience method for intersection using Rule.Max
def intersectMin(other: Index): Index
Convenience method for intersection using Rule.Min
def intersectRight(other: Index): Index
Convenience method for intersection using Rule.Right
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def lookup(query: Index): Index
Look up the given k-mers in this index, if they exist.
Look up the given k-mers in this index, if they exist. Convenience method. This is equivalent to intersect(query, Rule.Left).
def lookup(sequences: Seq[String]): Index
Look up the given NT sequences (strings) in this index, if they exist.
Look up the given NT sequences (strings) in this index, if they exist. Convenience method. This is equivalent to intersect(Index.fromNTSeqs(sequences, params), Rule.Left). This method is not intended for large amounts of data, as everything has to go through the Spark driver.
def mapTags(f: (Tag) ⇒ Tag): Index
Transform the tags of this index, returning a copy with the changes applied
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def newCompatible(discount: Discount, inFiles: String*): Index
Construct a compatible index (suitable for operations like intersection and union) from the given sequence files.
Construct a compatible index (suitable for operations like intersection and union) from the given sequence files. Settings will be copied from this index. The count method in the Discount object (pregrouped/simple) will be used, defaulting to Simple if none was specified. The minimizer scheme (splitter) used for this index will be reused.
discount
Source of settings such as count method and input format. k must be the same as in this index.
inFiles
Input files (fasta/fastq etc)
final def notify(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def notifyAll(): Unit

Definition Classes
AnyRef
Annotations
@native()
val params: IndexParams
def repartition(partitions: Int): Index
Repartition this index into a different number of partitions (and buckets when written to disk as parquet)
def sample(fraction: Double): Index
Sample k-mers from this index.
Sample k-mers from this index. Sampling is done on the level of distinct k-mers. K-mers will either be included with the same count as before, or omitted.
def showStats(outputLocation: Option[String] = None): Unit
Show summary stats for this index.
Show summary stats for this index. This action triggers a computation.
def stats(min: Option[Int] = None, max: Option[Int] = None): Dataset[BucketStats]
Obtain per-bucket (bin) statistics.
def subtract(other: Index, rule: Rule): Index
Subtract another index from this one, using e.g.
Subtract another index from this one, using e.g. Rule.KmersSubtract or Rule.CountersSubtract. Subtraction is implemented as a union, but this is not a commutative operation due to how the rules are implemented.
def subtractCounts(other: Index): Index
Convenience method for subtraction using Rule.CountersSubtract
def subtractKmers(other: Index): Index
Convenience method for subtraction using Rule.KmersSubtract
def subtractMany(ixs: Iterable[Index], rule: Rule): Index
Subtract a series of indexes B1, B2...
Subtract a series of indexes B1, B2... Bn from this index (A): ((A - B1) - B2) - ... using Rule.KmersSubtract or Rule.CountersSubtract.
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
def union(other: Index, rule: Rule): Index
Union this index with another one, combining the k-mers using the given reducer type.
Union this index with another one, combining the k-mers using the given reducer type. A k-mer is kept after a union operation if it is present in either of the input indexes, and passes any other rules that the reducer implements.
def unionLeft(other: Index): Index
Convenience method for union using Rule.Left
def unionMany(ixs: Iterable[Index], rule: Rule): Index
Union this index with a series of indexes using the given reducer type.
def unionMax(other: Index): Index
Convenience method for union using Rule.Max
def unionMin(other: Index): Index
Convenience method for union using Rule.Min
def unionRight(other: Index): Index
Convenience method for union using Rule.Right
def unpersist(): Unit
Unpersist this index, undoing the effect of caching.
Unpersist this index, undoing the effect of caching. See Dataset.unpersist.
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
def write(location: String)(implicit spark: SparkSession): Unit
Write this index to a location.
Write this index to a location. This action triggers a computation.
def writeBucketStats(location: String): Unit
Write per-bucket statistics to HDFS.
Write per-bucket statistics to HDFS. This action triggers a computation.
location
Directory (prefix name) to write data to
def writeHistogram(output: String): Unit
Write the histogram of this data to HDFS.
Write the histogram of this data to HDFS. This action triggers a computation.
output
Directory to write to (prefix name)

Packages

Index

Companion object Index

class Index extends AnyRef

Instance Constructors

Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

Index 

Companion object Index

class Index extends AnyRef

Instance Constructors

Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

Index