final def !=(arg0: Any): Boolean

Definition Classes: AnyRef → Any

final def ##(): Int

Definition Classes: AnyRef → Any

final def ==(arg0: Any): Boolean

Definition Classes: AnyRef → Any

final def asInstanceOf[T0]: T0

Definition Classes: Any

def clone(): AnyRef

Attributes: protected[lang]
Definition Classes: AnyRef
Annotations: @throws( ... ) @native()

final def eq(arg0: AnyRef): Boolean

Definition Classes: AnyRef

def equals(arg0: Any): Boolean

Definition Classes: AnyRef → Any

def finalize(): Unit

Attributes: protected[lang]
Definition Classes: AnyRef
Annotations: @throws( classOf[java.lang.Throwable] )

def fromReads(input: Dataset[NTSeq], method: CountMethod, normalize: Boolean, spl: Broadcast[AnyMinSplitter])(implicit spark: SparkSession): GroupedSegments

Construct GroupedSegments from a set of reads/sequences

input: The raw sequence data
method: Counting method/pipeline type
spl: Splitter for breaking the sequences into super-mers

final def getClass(): Class[_]

Definition Classes: AnyRef → Any
Annotations: @native()

def hashCode(): Int

Definition Classes: AnyRef → Any
Annotations: @native()

def hashSegments(input: NTSeq, splitter: AnyMinSplitter): Iterator[HashSegment]

Construct HashSegments from a single read

input: The raw sequence
splitter: Splitter for breaking the sequences into super-mers

def hashSegments(input: Dataset[NTSeq], spl: Broadcast[AnyMinSplitter])(implicit spark: SparkSession): Dataset[HashSegment]

Construct HashSegments from a set of reads/sequences

input: The raw sequence data
spl: Splitter for breaking the sequences into super-mers

final def isInstanceOf[T0]: Boolean

Definition Classes: Any

final def ne(arg0: AnyRef): Boolean

Definition Classes: AnyRef

final def notify(): Unit

Definition Classes: AnyRef
Annotations: @native()

final def notifyAll(): Unit

Definition Classes: AnyRef
Annotations: @native()

def segmentsByHash(segments: DataFrame)(implicit spark: SparkSession): DataFrame

Group segments by hash/minimizer, non-precounted This straightforward method is more efficient when supermers are not highly repeated in the data (low redundancy), or when the data is moderately sized.

Group segments by hash/minimizer, non-precounted This straightforward method is more efficient when supermers are not highly repeated in the data (low redundancy), or when the data is moderately sized. The outputs are compatible with the method above.

segments: Supermers to group

def segmentsByHashPregroup[S <: MinSplitter[MinimizerPriorities]](segments: DataFrame, addRC: Boolean, spl: Broadcast[S])(implicit spark: SparkSession): DataFrame

Group segments by hash/minimizer, pre-grouping and counting identical supermers at an early stage, before assigning to buckets.

Group segments by hash/minimizer, pre-grouping and counting identical supermers at an early stage, before assigning to buckets. This helps with high redundancy datasets and can greatly reduce the data volume that must be processed by later stages. However, it leads to one extra shuffle, so it may not be the best choice for moderately sized datasets. Reverse complements are optionally added after pregrouping (when we need to normalize k-mer orientation)

segments: Supermers to group
addRC: Whether to add reverse complements
spl: Splitter broadcast

final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes: AnyRef

def toString(): String

Definition Classes: AnyRef → Any

final def wait(): Unit

Definition Classes: AnyRef
Annotations: @throws( ... )

final def wait(arg0: Long, arg1: Int): Unit

Definition Classes: AnyRef
Annotations: @throws( ... )

final def wait(arg0: Long): Unit

Definition Classes: AnyRef
Annotations: @throws( ... ) @native()

Packages

GroupedSegments

Companion class GroupedSegments

object GroupedSegments

Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped