class Index extends AnyRef
A bucketed k-mer index. Indexes store super-mers in a Dataset of ReducibleBucket, where each k-mer is associated with a tag. Typically tags are k-mer counts, and then the Index becomes a multiset of counted k-mers. Indexes are immutable, like other Spark datastructures, and operations like filtering return a new Index rather than change the existing one in place. Indexes can be combined using operations like union, intersect, and subtract, and can be written to disk in various formats. The default format used by the write() and read() methods is bucketed parquet files, which gives good data compression and avoids shuffling when the same Index is used repeatedly.
- Alphabetic
- By Inheritance
- Index
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
Index(params: IndexParams, buckets: Dataset[ReducibleBucket])(implicit spark: SparkSession)
- params
Index parameters, which define the minimizer scheme, the lengths of k and m, and the number of buckets. Two indexes must have compatible parameters to be combined.
- buckets
K-mer buckets. Buckets contain super-mers and tags for each k-mer. Tags can be, for example, k-mer counts.
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
add(other: Index): Index
Convenience method for union using Rule.Sum
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
- def bcSplit: Broadcast[AnyMinSplitter]
- val buckets: Dataset[ReducibleBucket]
-
def
cache(): Index.this.type
Cache this index by caching the underlying dataset.
Cache this index by caching the underlying dataset. This will persist it in memory and on disk (if needed), which means that it does not have to be recomputed again if used repeatedly. See Dataset.cache.
-
def
changeMinimizerOrdering(spl: Broadcast[AnyMinSplitter]): Index
Split the super-mers according to a new minimizer ordering, generating an index with the same k-mers that respects the new ordering.
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
counted(normalize: Boolean = false): CountedKmers
Obtain counts for these k-mers.
Obtain counts for these k-mers.
- normalize
Whether to filter k-mers by orientation
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
filterCounts(min: Option[Int] = None, max: Option[Int] = None): Index
Filter counts in this index based on lower and/or upper bound
- def filterCounts(min: Int, max: Int): Index
-
def
filterMax(max: Int): Index
Convenience method to filter counts by maximum
-
def
filterMin(min: Int): Index
Convenience method to filter counts by minimum
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
histogram: Dataset[(Tag, Long)]
Obtain these counts as a histogram.
Obtain these counts as a histogram.
- returns
Pairs of abundances and their frequencies in the dataset.
-
def
intersect(other: Index, rule: Rule): Index
Intersect this index with another one, combining the k-mers using the given reducer type.
Intersect this index with another one, combining the k-mers using the given reducer type. A k-mer is kept after an intersection operation if it is present in both of the input indexes, and passes any other rules that the reducer implements.
-
def
intersectLeft(other: Index): Index
Convenience method for intersection using Rule.Left
-
def
intersectMany(ixs: Iterable[Index], rule: Rule): Index
Intersect this index with a series of indexes using the given reducer type.
-
def
intersectMax(other: Index): Index
Convenience method for intersection using Rule.Max
-
def
intersectMin(other: Index): Index
Convenience method for intersection using Rule.Min
-
def
intersectRight(other: Index): Index
Convenience method for intersection using Rule.Right
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
lookup(query: Index): Index
Look up the given k-mers in this index, if they exist.
Look up the given k-mers in this index, if they exist. Convenience method. This is equivalent to intersect(query, Rule.Left).
-
def
lookup(sequences: Seq[String]): Index
Look up the given NT sequences (strings) in this index, if they exist.
Look up the given NT sequences (strings) in this index, if they exist. Convenience method. This is equivalent to intersect(Index.fromNTSeqs(sequences, params), Rule.Left). This method is not intended for large amounts of data, as everything has to go through the Spark driver.
-
def
mapTags(f: (Tag) ⇒ Tag): Index
Transform the tags of this index, returning a copy with the changes applied
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
newCompatible(discount: Discount, inFiles: String*): Index
Construct a compatible index (suitable for operations like intersection and union) from the given sequence files.
Construct a compatible index (suitable for operations like intersection and union) from the given sequence files. Settings will be copied from this index. The count method in the Discount object (pregrouped/simple) will be used, defaulting to Simple if none was specified. The minimizer scheme (splitter) used for this index will be reused.
- discount
Source of settings such as count method and input format. k must be the same as in this index.
- inFiles
Input files (fasta/fastq etc)
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- val params: IndexParams
-
def
repartition(partitions: Int): Index
Repartition this index into a different number of partitions (and buckets when written to disk as parquet)
-
def
sample(fraction: Double): Index
Sample k-mers from this index.
Sample k-mers from this index. Sampling is done on the level of distinct k-mers. K-mers will either be included with the same count as before, or omitted.
-
def
showStats(outputLocation: Option[String] = None): Unit
Show summary stats for this index.
Show summary stats for this index. This action triggers a computation.
-
def
stats(min: Option[Int] = None, max: Option[Int] = None): Dataset[BucketStats]
Obtain per-bucket (bin) statistics.
-
def
subtract(other: Index, rule: Rule): Index
Subtract another index from this one, using e.g.
Subtract another index from this one, using e.g. Rule.KmersSubtract or Rule.CountersSubtract. Subtraction is implemented as a union, but this is not a commutative operation due to how the rules are implemented.
-
def
subtractCounts(other: Index): Index
Convenience method for subtraction using Rule.CountersSubtract
-
def
subtractKmers(other: Index): Index
Convenience method for subtraction using Rule.KmersSubtract
-
def
subtractMany(ixs: Iterable[Index], rule: Rule): Index
Subtract a series of indexes B1, B2...
Subtract a series of indexes B1, B2... Bn from this index (A): ((A - B1) - B2) - ... using Rule.KmersSubtract or Rule.CountersSubtract.
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
union(other: Index, rule: Rule): Index
Union this index with another one, combining the k-mers using the given reducer type.
Union this index with another one, combining the k-mers using the given reducer type. A k-mer is kept after a union operation if it is present in either of the input indexes, and passes any other rules that the reducer implements.
-
def
unionLeft(other: Index): Index
Convenience method for union using Rule.Left
-
def
unionMany(ixs: Iterable[Index], rule: Rule): Index
Union this index with a series of indexes using the given reducer type.
-
def
unionMax(other: Index): Index
Convenience method for union using Rule.Max
-
def
unionMin(other: Index): Index
Convenience method for union using Rule.Min
-
def
unionRight(other: Index): Index
Convenience method for union using Rule.Right
-
def
unpersist(): Unit
Unpersist this index, undoing the effect of caching.
Unpersist this index, undoing the effect of caching. See Dataset.unpersist.
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
write(location: String)(implicit spark: SparkSession): Unit
Write this index to a location.
Write this index to a location. This action triggers a computation.
-
def
writeBucketStats(location: String): Unit
Write per-bucket statistics to HDFS.
Write per-bucket statistics to HDFS. This action triggers a computation.
- location
Directory (prefix name) to write data to
-
def
writeHistogram(output: String): Unit
Write the histogram of this data to HDFS.
Write the histogram of this data to HDFS. This action triggers a computation.
- output
Directory to write to (prefix name)