Clustering Algorithms

A variety of models and algorithms for clustering

+ Collaboration diagram for Clustering Algorithms:

Classes

class  shark::AbstractClustering< InputT >
 Base class for clustering. More...
 
class  shark::Centroids
 Clusters defined by centroids. More...
 
class  shark::ClusteringModel< InputT, OutputT >
 Abstract model with associated clustering object. More...
 
class  shark::HardClusteringModel< InputT >
 Model for "hard" clustering. More...
 
class  shark::HierarchicalClustering< InputT >
 Clusters defined by a binary space partitioning tree. More...
 
class  shark::SoftClusteringModel< InputT >
 Model for "soft" clustering. More...
 

Functions

SHARK_EXPORT_SYMBOL std::size_t shark::kMeans (Data< RealVector > const &data, std::size_t k, Centroids &centroids, std::size_t maxIterations=0)
 The k-means clustering algorithm.
 

Function Documentation

◆ kMeans()

SHARK_EXPORT_SYMBOL std::size_t shark::kMeans ( Data< RealVector > const &  data,
std::size_t  k,
Centroids centroids,
std::size_t  maxIterations = 0 
)

The k-means clustering algorithm.

The k-means algorithm takes vector-valued data \( \{x_1, \dots, x_n\} \subset \mathbb R^d \) and splits it into k clusters, based on centroids \( \{c_1, \dots, c_k\} \). The result is stored in a Centroids object that can be used to construct clustering models.
This implementation starts the search with the given centroids, in case the provided centroids object (third parameter) contains a set of k centroids. Otherwise the search starts from the first k data points.
Note that the data set needs to include at least k data points for k-means to work. This is because the current implementation does not allow for empty clusters.
Parameters
datavector-valued data to be clustered
knumber of clusters
centroidscentroids input/output
maxIterationsmaximum number of k-means iterations; 0: unlimited
Returns
number of k-means iterations

Referenced by main().