k-means
UnsupervisedClustering.Kmeans
— TypeKmeans(
metric::SemiMetric = SqEuclidean()
verbose::Bool = DEFAULT_VERBOSE
rng::AbstractRNG = Random.GLOBAL_RNG
tolerance::Real = DEFAULT_TOLERANCE
max_iterations::Integer = DEFAULT_MAX_ITERATIONS
)
The k-means is a clustering algorithm that aims to partition data into clusters by minimizing the distances between data points and their cluster centroids.
Fields
metric
: defines the distance metric used to compute the distances between data points and cluster centroids.verbose
: controls whether the algorithm should display additional information during execution.rng
: represents the random number generator to be used by the algorithm.tolerance
: represents the convergence criterion for the algorithm. It determines the maximum change allowed in the centroid positions between consecutive iterations.max_iterations
: represents the maximum number of iterations the algorithm will perform before stopping, even if convergence has not been reached.
References
- Hartigan, John A., and Manchek A. Wong. Algorithm AS 136: A k-means clustering algorithm. Journal of the royal statistical society. series c (applied statistics) 28.1 (1979): 100-108.
- Lloyd, Stuart. Least squares quantization in PCM. IEEE transactions on information theory 28.2 (1982): 129-137.
UnsupervisedClustering.KmeansResult
— TypeKmeansResult(
assignments::AbstractVector{<:Integer}
clusters::AbstractMatrix{<:Real}
objective::Real
objective_per_cluster::AbstractVector{<:Real}
iterations::Integer
elapsed::Real
converged::Bool
k::Integer
)
KmeansResult struct represents the result of the k-means clustering algorithm.
Fields
assignments
: an integer vector that stores the cluster assignment for each data point.clusters
: a floating-point matrix representing the cluster's centroid.objective
: a floating-point number representing the objective function after running the algorithm. The objective function measures the quality of the clustering solution.objective_per_cluster
: a floating-point vector that stores the objective function of each clusteriterations
: an integer value indicating the number of iterations performed until the algorithm has converged or reached the maximum number of iterationselapsed
: a floating-point number representing the time in seconds for the algorithm to complete.converged
: indicates whether the algorithm has converged to a solution.k
: the number of clusters.
UnsupervisedClustering.fit!
— Methodfit!(
kmeans::Kmeans,
data::AbstractMatrix{<:Real},
result::KmeansResult
)
The fit!
function performs the k-means clustering algorithm on the given result as the initial point and updates the provided object with the clustering result.
Parameters:
kmeans
: an instance representing the clustering settings and parameters.data
: a floating-point matrix, where each row represents a data point, and each column represents a feature.result
: a result object that will be updated with the clustering result.
Example
n = 100
d = 2
k = 2
data = rand(n, d)
kmeans = Kmeans()
result = KmeansResult(n, [1.0 2.0; 1.0 2.0])
fit!(kmeans, data, result)
UnsupervisedClustering.fit
— Methodfit(
kmeans::Kmeans,
data::AbstractMatrix{<:Real},
initial_clusters::AbstractVector{<:Integer}
)
The fit
function performs the k-means clustering algorithm on the given data points as the initial point and returns a result object representing the clustering result.
Parameters:
kmeans
: an instance representing the clustering settings and parameters.data
: a floating-point matrix, where each row represents a data point, and each column represents a feature.initial_clusters
: an integer vector where each element is the initial data point for each cluster.
Example
n = 100
d = 2
k = 2
data = rand(n, d)
kmeans = Kmeans()
result = fit(kmeans, data, [4, 12])
UnsupervisedClustering.fit
— Methodfit(
kmeans::Kmeans,
data::AbstractMatrix{<:Real},
k::Integer
)
The fit
function performs the k-means clustering algorithm and returns a result object representing the clustering result.
Parameters:
kmeans
: an instance representing the clustering settings and parameters.data
: a floating-point matrix, where each row represents a data point, and each column represents a feature.k
: an integer representing the number of clusters.
Example
n = 100
d = 2
k = 2
data = rand(n, d)
kmeans = Kmeans()
result = fit(kmeans, data, k)