diff --git a/README.md b/README.md
index 4ba957f..fcafdc8 100644
--- a/README.md
+++ b/README.md
@@ -19,756 +19,3 @@ implement it using the package and send a pull request along with the paper anal
This code has been tested on data sets of tens of millions of points in a 700+ dimensional space
using a variety of distance functions. Thanks to the excellent core Spark implementation, it rocks!
-
-Table of Contents
-=================
-
-* [Generalized K-Means Clustering](#generalized-k-means-clustering)
- * [Getting Started](#getting-started)
- * [Introduction](#introduction)
- * [Bregman Divergences](#bregman-divergences)
- * [Compute Bregman Distances Efficiently using BregmanPoint
s and BregmanCenter
s](#compute-bregman-distances-efficiently-using-bregmanpoints--and-bregmancenters)
- * [Representing K-Means Models](#representing-k-means-models)
- * [Constructing K-Means Models using Clusterers](#constructing-k-means-models-using-clusterers)
- * [Constructing K-Means Models via Lloyd's Algorithm](#constructing-k-means-models-via-lloyds-algorithm)
- * [Constructing K-Means Models on WeightedVector
s](#constructing-k-means-models-on-weightedvectors)
- * [Constructing K-Means Models Iteratively](#constructing-k-means-models-iteratively)
- * [Seeding the Set of Cluster Centers](#seeding-the-set-of-cluster-centers)
- * [Iterative Clustering](#iterative-clustering)
- * [Creating a Custom K-means Clusterer](#creating-a-custom-k-means-clusterer)
- * [Custom BregmanDivergence
](#custom-bregmandivergence)
- * [Custom BregmanPointOps
](#custom-bregmanpointops)
- * [Custom Embedding
](#custom-embedding)
- * [Creating K-Means Models using the KMeansModel
Helper Object](#creating-k-means-models-using-the-kmeansmodel-helper-object)
-
-### Getting Started
-
-The massivedatascience-clusterer project is built for Spark 3.4, Scala 2.12, and Java 17.
-
-
-### Introduction
-
-The goal of K-Means clustering is to produce a set of clusters of a set of points that satisfies
-certain optimality constraints. That model is called a K-Means model. It is fundamentally a set
-of points and a function that defines the distance from an arbitrary point to a cluster center.
-
-The K-Means algorithm computes a K-Means model using an iterative algorithm known as
-[Lloyd's algorithm](http://en.wikipedia.org/wiki/Lloyd%27s_algorithm).
-Each iteration of Lloyd's algorithm assigns a set of points to clusters, then updates the cluster
-centers to acknowledge the assignment of the points to the cluster.
-
-The update of clusters is a form of averaging. Newly added points are averaged into the cluster
-while (optionally) reassigned points are removed from their prior clusters.
-
-
-#### Bregman Divergences
-
-While one can assign a point to a cluster using any distance function, Lloyd's algorithm only
-converges for a certain set of distance functions called [Bregman divergences](http://www.cs.utexas.edu/users/inderjit/public_papers/bregmanclustering_jmlr.pdf). Bregman divergences
-must define two methods, ```convex``` to evaluate a function on a point and ```gradientOfConvex``` to evaluate the
-gradient of the function on a point.
-
-```scala
-package com.massivedatascience.divergence
-
-trait BregmanDivergence {
- def convex(v: Vector): Double
-
- def gradientOfConvex(v: Vector): Vector
-}
-
-```
-
-For example, by defining ```convex``` to be the squared vector norm (i.e. the sum of the squares of
-the coordinates), one gets a distance function that equals the square of the well known Euclidean
-distance. We name it the ```SquaredEuclideanDistanceDivergence```.
-
-In addition to the squared Euclidean distance function, this implementation provides several
-other very useful distance functions. The provided ```BregmanDivergence```s may be accessed by
-supplying the name of the desired object to the apply method of the companion object.
-
-
-| Name | Space | Divergence | Input |
-|--------|-------|-------------------------|---------|
-| ```SquaredEuclideanDistanceDivergence``` | $\mathbb{R}^d$ |Squared Euclidean | |
-| ```RealKullbackLeiblerSimplexDivergence``` | $\mathbb{R}^d_{>0}$ |[Kullback-Leibler](http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence) | Dense |
-| ```NaturalKLSimplexDivergence``` | $\mathbb{N}^d_{>0}$ |Kullback-Leibler | Dense |
-| ```RealKLDivergence``` | $\mathbb{R}^d$ |Kullback-Leibler | Dense |
-| ```NaturalKLDivergence``` | $\mathbb{N}^d$ |Kullback-Leibler | Dense |
-| ```ItakuraSaitoDivergence``` | $\mathbb{R}^d_{>0}$ |Kullback-Leibler | Sparse |
-| ```LogisticLossDivergence``` | $\mathbb{R}$ |Logistic Loss | |
-| ```GeneralizedIDivergence``` | $\mathbb{R}$ |Generalized I | |
-
-When selecting a distance function, consider the domain of
-the input data. For example, frequency
-data is integral. Similarity of frequencies or distributions are best performed using the
-Kullback-Leibler divergence.
-
-
-#### Compute Bregman Distances Efficiently using ```BregmanPoint```s and ```BregmanCenter```s
-
-For efficient repeated computation of distance between a fixed set of points and varying cluster
-centers, is it convenient to pre-compute certain information and associate that information with
-the point or the cluster center. The class that represent an enriched point is ```BregmanPoint```.
-The class that represent the enriched cluster center is ```BregmanCenter```. Users
-of this package do not construct instances of these objects directly.
-
-```scala
-package com.massivedatascience.divergence
-
-trait BregmanPoint
-
-trait BregmanCenter
-```
-
-
-We enrich a Bregman divergence with a set of commonly used operations, including factory
-methods ```toPoint``` and ```toCenter``` to construct instances of the aforementioned ```BregmanPoint```
-and ```BregmanCenter```.
-
-The enriched trait is the ```BregmanPointOps```.
-
-```scala
-package com.massivedatascience.clusterer
-
-trait BregmanPointOps {
- type P = BregmanPoint
- type C = BregmanCenter
-
- val divergence: BregmanDivergence
-
- def toPoint(v: WeightedVector): P
-
- def toCenter(v: WeightedVector): C
-
- def centerMoved(v: P, w: C): Boolean
-
- def findClosest(centers: IndexedSeq[C], point: P): (Int, Double)
-
- def findClosestCluster(centers: IndexedSeq[C], point: P): Int
-
- def distortion(data: RDD[P], centers: IndexedSeq[C])
-
- def pointCost(centers: IndexedSeq[C], point: P): Double
-
- def distance(p: BregmanPoint, c: BregmanCenter): Double
-}
-
-object BregmanPointOps {
-
- def apply(distanceFunction: String): BregmanPointOps = ???
-
-}
-```
-
-
-One may construct instances of ```BregmanPointOps``` using the companion object.
-
-| Name | Divergence |
-|--------|----------------|
-| ```EUCLIDEAN``` |Squared Euclidean |
-| ```RELATIVE_ENTROPY``` |[Kullback-Leibler](http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence) |
-| ```DISCRETE_KL``` |Kullback-Leibler |
-| ```DISCRETE_SMOOTHED_KL``` |Kullback-Leibler |
-| ```SPARSE_SMOOTHED_KL``` |Kullback-Leibler |
-| ```LOGISTIC_LOSS``` |Logistic Loss |
-| ```GENERALIZED_I``` |Generalized I |
-| ```ITAKURA_SAITO``` |[Itakura-Saito](http://en.wikipedia.org/wiki/Itakura%E2%80%93Saito_distance) |
-
-#### Representing K-Means Models
-
-With these definitions, we define our realization of a k-means model, ```KMeansModel```, which
-we enrich with operations to find closest clusters to a point and to compute distances:
-
-```scala
-package com.massivedatascience.clusterer
-
-trait KMeansModel {
-
- val pointOps: BregmanPointOps
-
- def centers: IndexedSeq[BregmanCenter]
-
-
- def predict(point: Vector): Int
-
- def predictClusterAndDistance(point: Vector): (Int, Double)
-
- def predict(points: RDD[Vector]): RDD[Int]
-
- def predict(points: JavaRDD[Vector]): JavaRDD[java.lang.Integer]
-
- def computeCost(data: RDD[Vector]): Double
-
-
- def predictWeighted(point: WeightedVector): Int
-
- def predictClusterAndDistanceWeighted(point: WeightedVector): (Int, Double)
-
- def predictWeighted(points: RDD[WeightedVector]): RDD[Int]
-
- def computeCostWeighted(data: RDD[WeightedVector]): Double
-
-
- def predictBregman(point: BregmanPoint): Int
-
- def predictClusterAndDistanceBregman(point: BregmanPoint): (Int, Double)
-
- def predictBregman(points: RDD[BregmanPoint]): RDD[Int]
-
- def computeCostBregman(data: RDD[BregmanPoint): Double
-}
-```
-
-#### Constructing K-Means Models using Clusterers
-
-One may construct K-Means models using one of the provided clusterers that implement Lloyd's algorithm.
-
-```scala
-trait MultiKMeansClusterer extends Serializable with Logging {
- def cluster(
- maxIterations: Int,
- pointOps: BregmanPointOps,
- data: RDD[BregmanPoint],
- centers: Seq[IndexedSeq[BregmanCenter]]): Seq[(Double, IndexedSeq[BregmanCenter])]
-
- def best(
- maxIterations: Int,
- pointOps: BregmanPointOps,
- data: RDD[BregmanPoint],
- centers: Seq[IndexedSeq[BregmanCenter]]): (Double, IndexedSeq[BregmanCenter]) = {
- cluster(maxIterations, pointOps, data, centers).minBy(_._1)
- }
-}
-
-object MultiKMeansClusterer {
- def apply(clustererName: String): MultiKMeansClusterer = ???
-}
-```
-
-The ```COLUMN_TRACKING``` algorithm tracks the assignments of points to clusters and the distance of
-points to their assigned cluster. In later iterations of Lloyd's algorithm, this information can
-be used to reduce the number of distance calculations needed to accurately reassign points. This
-is a novel implementation.
-
-The ```MINI_BATCH_10``` algorithm implements the [mini-batch algorithm](http://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf).
-This implementation should be used when the number of points is much larger than the dimension of the data and the
-number of clusters desired.
-
-The ```RESEED``` algorithm fills empty clusters with newly seeded cluster centers
-in an effort to reach the target number of desired clusters.
-
-Objects implementing these algorithms may be constructed using the ```apply``` method of the
-companion object ```MultiKMeansClusterer```.
-
-
-| Name | Algorithm |
-|----------------------------------|-----------------------------------|
-| ```COLUMN_TRACKING``` | high performance implementation that performs less work on later rounds |
-| ```MINI_BATCH_10``` | a mini-batch clusterer that samples 10% of the data each round to update centroids |
-| ```RESEED``` | a clusterer that re-seeds empty clusters |
-
-
-### Constructing K-Means Models via Lloyd's Algorithm
-
-A ```KMeansModel``` can be constructed from any set of cluster centers and distance function.
-However, the more interesting models satisfy an optimality constraint. If we sum the distances
-from the points in a given set to their closest cluster centers, we get a number called the
-"distortion" or "cost". A K-Means Model is locally optimal with respect to a set of points
-if each cluster center is determined by the mean of the points assigned to that cluster.
-Computing such a ```KMeansModel``` given a set of points is called "training" the model on those
-points.
-
-The simplest way to train a ```KMeansModel``` on a fixed set of points is to use the ```KMeans.train```
-method. This method is most similar in style to the one provided by the Spark 1.2.0 K-Means clusterer.
-
-For dense data in a low dimension space using the squared Euclidean distance function,
-one may simply call ```KMeans.train``` with the data and the desired number of clusters:
-
-```scala
-import com.com.massivedatascience.clusterer
-import org.apache.spark.mllib.linalg.Vector
-
-val model : KMeansModel = KMeans.train(data: RDD[Vector], k: Int)
-```
-
-The full signature of the ```KMeans.train``` method is:
-
-```scala
-package com.massivedatascience.clusterer
-
-object KMeans {
- /**
- *
- * Train a K-Means model using Lloyd's algorithm.
- *
- * @param data input data
- * @param k number of clusters desired
- * @param maxIterations maximum number of iterations of Lloyd's algorithm
- * @param runs number of parallel clusterings to run
- * @param mode initialization algorithm to use
- * @param distanceFunctionNames the distance functions to use
- * @param clustererName which k-means implementation to use
- * @param embeddingNames sequence of embeddings to use, from lowest dimension to greatest
- * @return K-Means model
- */
- def train(
- data: RDD[Vector],
- k: Int,
- maxIterations: Int = KMeans.defaultMaxIterations,
- runs: Int = KMeans.defaultNumRuns,
- mode: String = KMeansSelector.K_MEANS_PARALLEL,
- distanceFunctionNames: Seq[String] = Seq(BregmanPointOps.EUCLIDEAN),
- clustererName: String = MultiKMeansClusterer.COLUMN_TRACKING,
- embeddingNames: List[String] = List(Embedding.IDENTITY_EMBEDDING)): KMeansModel = ???
-}
-```
-
-Many of these parameters will be familiar to anyone who is familiar with the Spark 1.1 clusterer.
-
-Similar to the Spark clusterer, we support data provided as ```Vectors```, a request for a number
-```k``` of clusters desired, a limit ```maxIterations``` on the number of iterations of Lloyd's
-algorithm, and the number of parallel ```runs``` of the clusterer.
-
-We also offer different initialization ```mode```s. But
-unlike the Spark clusterer, we do not support setting the number of initialization steps for the
-mode at this level of the interface.
-
-The ```K-Means.train``` helper methods allows one to name a sequence of embeddings.
-Several embeddings are provided that may be constructed using the ```apply``` method
-of the companion object ```Embedding```.
-
-
-| Name | Algorithm |
-|-------------------------------|-------------------------------------------------------------|
-| ```IDENTITY_EMBEDDING``` | Identity |
-| ```HAAR_EMBEDDING``` | [Haar Transform](http://www.cs.gmu.edu/~jessica/publications/ikmeans_sdm_workshop03.pdf) |
-| ```LOW_DIMENSIONAL_RI``` | [Random Indexing](https://en.wikipedia.org/wiki/Random_indexing) with dimension 64 and epsilon = 0.1 |
-| ```MEDIUM_DIMENSIONAL_RI``` | Random Indexing with dimension 256 and epsilon = 0.1 |
-| ```HIGH_DIMENSIONAL_RI``` | Random Indexing with dimension 1024 and epsilon = 0.1 |
-| ```SYMMETRIZING_KL_EMBEDDING``` | [Symmetrizing KL Embedding](http://www-users.cs.umn.edu/~banerjee/papers/13/bregman-metric.pdf) |
-
-Different distance functions may be used for each embedding. There must be exactly one
-distance function per embedding provided.
-
-#### Constructing K-Means Models on ```WeightedVector```s
-
-Often, data points that are clustered have varying significance, i.e. they are weighted.
-This clusterer operates on weighted vectors. Use these ```WeightedVector``` companion object to construct weighted vectors.
-
-```scala
-package com.massivedatascience.linalg
-
-trait WeightedVector extends Serializable {
- def weight: Double
-
- def inhomogeneous: Vector
-
- def homogeneous: Vector
-
- def size: Int = homogeneous.size
-}
-
-object WeightedVector {
-
- def apply(v: Vector): WeightedVector = ???
-
- def apply(v: Array[Double]): WeightedVector = ???
-
- def apply(v: Vector, weight: Double): WeightedVector = ???
-
- def apply(v: Array[Double], weight: Double): WeightedVector = ???
-
- def fromInhomogeneousWeighted(v: Array[Double], weight: Double): WeightedVector = ???
-
- def fromInhomogeneousWeighted(v: Vector, weight: Double): WeightedVector = ???
-}
-```
-
-Indeed, the ```KMeans.train``` helper translates the parameters into a call to the underlying
-```KMeans.trainWeighted``` method.
-
-```scala
-package com.massivedatascience.clusterer
-
-object KMeans {
- /**
- *
- * Train a K-Means model using Lloyd's algorithm on WeightedVectors
- *
- * @param data input data
- * @param runConfig run configuration
- * @param pointOps the distance functions to use
- * @param initializer initialization algorithm to use
- * @param embeddings sequence of embeddings to use, from lowest dimension to greatest
- * @param clusterer which k-means implementation to use
- * @return K-Means model
- */
-
- def trainWeighted(
- runConfig: RunConfig,
- data: RDD[WeightedVector],
- initializer: KMeansSelector,
- pointOps: Seq[BregmanPointOps],
- embeddings: Seq[Embedding],
- clusterer: MultiKMeansClusterer): KMeansModel = ???
- }
-}
-```
-
-The ```KMeans.trainWeighted``` method ultimately makes various calls to the underlying
-```KMeans.simpleTrain``` method, which clusters the provided ```BregmanPoint```s using
-the provided ```BregmanPointOps``` and the provided ```KMeansSelector``` with the provided ```MultiKMeansClusterer```.
-
-
-```scala
-package com.massivedatascience.clusterer
-
-object KMeans {
- /**
- *
- * @param runConfig run configuration
- * @param data input data
- * @param pointOps the distance functions to use
- * @param initializer initialization algorithm to use
- * @param clusterer which k-means implementation to use
- * @return K-Means model
- */
- def simpleTrain(
- runConfig: RunConfig,
- data: RDD[BregmanPoint],
- pointOps: BregmanPointOps,
- initializer: KMeansSelector,
- clusterer: MultiKMeansClusterer): KMeansModel = ???
- }
-}
-```
-
-#### Constructing K-Means Models Iteratively
-
-If multiple embeddings are provided, the ```KMeans.train``` method actually performs the embeddings
-and trains on the embedded data sets iteratively.
-
-For example, for high dimensional data, one way wish to embed the data into a lower dimension before clustering to
-reduce running time.
-
-For time series data,
-[the Haar Transform](http://www.cs.gmu.edu/~jessica/publications/ikmeans_sdm_workshop03.pdf)
-has been used successfully to reduce running time while maintaining or improving quality.
-
-For high-dimensional sparse data,
-[random indexing](http://en.wikipedia.org/wiki/Random_indexing)
-can be used to map the data into a low dimensional dense space.
-
-One may also perform clustering recursively, using lower dimensional clustering to derive initial
-conditions for higher dimensional clustering.
-
-Should you wish to train a model iteratively on data sets derived maps of a shared original data
-set, you may use ```KMeans.iterativelyTrain```.
-
-
-```scala
-package com.massivedatascience.clusterer
-
-object KMeans {
- /**
- * Train on a series of data sets, where the data sets were derived from the same
- * original data set via embeddings. Use the cluster assignments of one stage to
- * initialize the clusters of the next stage.
- *
- * @param runConfig run configuration
- * @param dataSets input data sets to use
- * @param initializer initialization algorithm to use
- * @param pointOps distance function
- * @param clusterer clustering implementation to use
- * @return
- */
- def iterativelyTrain(
- runConfig: RunConfig,
- pointOps: Seq[BregmanPointOps],
- dataSets: Seq[RDD[BregmanPoint]],
- initializer: KMeansSelector,
- clusterer: MultiKMeansClusterer): KMeansModel = ???
-
-```
-
-#### Seeding the Set of Cluster Centers
-
-Any K-Means model may be used as seed value to Lloyd's algorithm. In fact, our clusterers accept
-multiple seed sets. The ```K-Means.train``` helper methods allows one to name an initialization
-method.
-
-Two algorithms are implemented that produce viable seed sets.
-They may be constructed by using the ```apply``` method
-of the companion object```KMeansSelector```".
-
-| Name | Algorithm |
-|----------------------------------|-----------------------------------|
-| ```RANDOM``` | Random selection of initial k centers |
-| ```K_MEANS_PARALLEL``` | a 5 step [K-Means Parallel implementation](http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf) |
-
-Under the covers, these initializers implement the ```KMeansSelector``` trait
-
-```scala
-package com.massivedatascience.clusterer
-
-trait KMeansSelector extends Serializable {
- def init(
- ops: BregmanPointOps,
- d: RDD[BregmanPoint],
- numClusters: Int,
- initialInfo: Option[(Seq[IndexedSeq[BregmanCenter]], Seq[RDD[Double]])] = None,
- runs: Int,
- seed: Long): Seq[IndexedSeq[BregmanCenter]]
-}
-
-object KMeansSelector {
- def apply(name: String): KMeansSelector = ???
-}
-```
-
-#### Iterative Clustering
-
-K-means clustering can be performed iteratively using different embeddings of the data. For example,
-with high-dimensional time series data, it may be advantageous to:
-
-* Down-sample the data via the Haar transform (aka averaging)
-* Solve the K-means clustering problem on the down-sampled data
-* Assign the downsampled points to clusters.
-* Create a new KMeansModel using the assignments on the original data
-* Solve the K-Means clustering on the KMeansModel so constructed
-
-This technique has been named the ["Anytime" Algorithm](http://www.cs.gmu.edu/~jessica/publications/ikmeans_sdm_workshop03.pdf).
-
-The ```com.massivedatascience.clusterer.KMeans``` helper method provides a method, ```timeSeriesTrain```
-that embeds the data iteratively.
-
-```scala
-package com.massivedatascience.clusterer
-
-object KMeans {
-
- def timeSeriesTrain(
- runConfig: RunConfig,
- data: RDD[WeightedVector],
- initializer: KMeansSelector,
- pointOps: BregmanPointOps,
- clusterer: MultiKMeansClusterer,
- embedding: Embedding = Embedding(HAAR_EMBEDDING)): KMeansModel = ???
- }
-}
-```
-
-High dimensional data can be clustered directly, but the cost is proportional to the dimension. If
-the divergence of interest is squared Euclidean distance, one can using
-[Random Indexing](http://en.wikipedia.org/wiki/Random_indexing) to
-down-sample the data while preserving distances between clusters, with high probability.
-
-The ```com.massivedatascience.clusterer.KMeans``` helper method provides a method, ```sparseTrain```
-that embeds into various dimensions using random indexing.
-
-```scala
-package com.massivedatascience.clusterer
-
-object KMeans {
-
- def sparseTrain(raw: RDD[Vector], k: Int): KMeansModel = {
- train(raw, k,
- embeddingNames = List(Embedding.LOW_DIMENSIONAL_RI, Embedding.MEDIUM_DIMENSIONAL_RI,
- Embedding.HIGH_DIMENSIONAL_RI))
- }
-}
-```
-
-### Creating a Custom K-means Clusterer
-
-There are many ways to create your our custom K-means clusterer from these components.
-
-
-#### Custom ```BregmanDivergence```
-
-You may create your own custom ```BregmanDivergence``` given a suitable continuously-differentiable
-real-valued and strictly convex function defined on a closed convex set in R^^N using the
-```apply``` method of the companion object. Send a pull request to have it added
-the the package.
-
-```scala
-package com.massivedatascience.divergence
-
-object BregmanDivergence {
-
- /**
- * Create a Bregman Divergence from
- * @param f any continuously-differentiable real-valued and strictly
- * convex function defined on a closed convex set in R^^N
- * @param gradientF the gradient of f
- * @return a Bregman Divergence on that function
- */
- def apply(f: (Vector) => Double, gradientF: (Vector) => Vector): BregmanDivergence = ???
-}
-```
-
-#### Custom ```BregmanPointOps```
-
-You may create your own custom ```BregmanPointsOps```
-from your own implementation of the ```BregmanDivergence``` trait given a ```BregmanDivergence```
-using the ```apply``` method of the companion object. Send a pull request to have it added
-the the package.
-
-
-```scala
-package com.massivedatascience.clusterer
-
-object BregmanPointOps {
-
- def apply(d: BregmanDivergence): BregmanPointOps = ???
-
- def apply(d: BregmanDivergence, factor: Double): BregmanPointOps = ???
-}
-```
-
-#### Custom ```Embedding```
-
-Perhaps you have a dimensionality reduction method that is not provided by one of the standard
-embeddings. You may create your own embedding.
-
-For example, If the number of clusters desired is small, but the dimension is high, one may also use the method
-of [Random Projections](http://www.cs.toronto.edu/~zouzias/downloads/papers/NIPS2010_kmeans.pdf).
-At present, no embedding is provided for random projections, but, hey, I have to leave something for
-you to do! Send a pull request!!!
-
-
-### Creating K-Means Models using the ```KMeansModel``` Helper Object
-
-Training a K-Means model from a set of points using ```KMeans.train``` is one way to create a
-```KMeansModel```. However,
-there are many others that are useful. The ```KMeansModel``` companion object provides a number
-of these constructors.
-
-
-```scala
-package com.massivedatascience.clusterer
-
-object KMeansModel {
-
- /**
- * Create a K-means model from given cluster centers and weights
- *
- * @param ops distance function
- * @param centers initial cluster centers in homogeneous coordinates
- * @param weights initial cluster weights
- * @return k-means model
- */
- def fromVectorsAndWeights(
- ops: BregmanPointOps,
- centers: IndexedSeq[Vector],
- weights: IndexedSeq[Double]) = ???
-
- /**
- * Create a K-means model from given weighted vectors
- *
- * @param ops distance function
- * @param centers initial cluster centers as weighted vectors
- * @return k-means model
- */
- def fromWeightedVectors[T <: WeightedVector : ClassTag](
- ops: BregmanPointOps,
- centers: IndexedSeq[T]) = ???
-
- /**
- * Create a K-means model by selecting a set of k points at random
- *
- * @param ops distance function
- * @param k number of centers desired
- * @param dim dimension of space
- * @param weight initial weight of points
- * @param seed random number seed
- * @return k-means model
- */
- def usingRandomGenerator(ops: BregmanPointOps,
- k: Int,
- dim: Int,
- weight: Double,
- seed: Long = XORShiftRandom.random.nextLong()) = ???
-
- /**
- * Create a K-Means model using the KMeans++ algorithm on an initial set of candidate centers
- *
- * @param ops distance function
- * @param data initial candidate centers
- * @param weights initial weights
- * @param k number of clusters desired
- * @param perRound number of candidates to add per round
- * @param numPreselected initial sub-sequence of candidates to always select
- * @param seed random number seed
- * @return k-means model
- */
- def fromCenters[T <: WeightedVector : ClassTag](
- ops: BregmanPointOps,
- data: IndexedSeq[T],
- weights: IndexedSeq[Double],
- k: Int,
- perRound: Int,
- numPreselected: Int,
- seed: Long = XORShiftRandom.random.nextLong()): KMeansModel = ???
-
- /**
- * Create a K-Means Model from a streaming k-means model.
- *
- * @param streamingKMeansModel mutable streaming model
- * @return immutable k-means model
- */
- def fromStreamingModel(streamingKMeansModel: StreamingKMeansModel): KMeansModel = ???
-
- /**
- * Create a K-Means Model from a set of assignments of points to clusters
- *
- * @param ops distance function
- * @param points initial bregman points
- * @param assignments assignments of points to clusters
- * @return
- */
- def fromAssignments[T <: WeightedVector : ClassTag](
- ops: BregmanPointOps,
- points: RDD[T],
- assignments: RDD[Int]): KMeansModel = ???
-
- /**
- * Create a K-Means Model using K-Means || algorithm from an RDD of Bregman points.
- *
- * @param ops distance function
- * @param data initial points
- * @param k number of cluster centers desired
- * @param numSteps number of iterations of k-Means ||
- * @param sampleRate fractions of points to use in weighting clusters
- * @param seed random number seed
- * @return k-means model
- */
- def usingKMeansParallel[T <: WeightedVector : ClassTag](
- ops: BregmanPointOps,
- data: RDD[T],
- k: Int,
- numSteps: Int = 2,
- sampleRate: Double = 1.0,
- seed: Long = XORShiftRandom.random.nextLong()): KMeansModel = ???
-
- /**
- * Construct a K-Means model using the Lloyd's algorithm given a set of initial
- * K-Means models.
- *
- * @param ops distance function
- * @param data points to fit
- * @param initialModels initial k-means models
- * @param clusterer k-means clusterer to use
- * @param seed random number seed
- * @return the best K-means model found
- */
- def usingLloyds[T <: WeightedVector : ClassTag](
- ops: BregmanPointOps,
- data: RDD[T],
- initialModels: Seq[KMeansModel],
- clusterer: MultiKMeansClusterer = new ColumnTrackingKMeans(),
- seed: Long = XORShiftRandom.random.nextLong()): KMeansModel = ???
-}
-
-```