K-medoids Clustering
K-medoids is a robust clustering algorithm that uses actual data points (medoids) as cluster centers instead of computed centroids. This makes it more resistant to outliers and suitable for non-Euclidean distance metrics.
Overview
K-medoids iteratively:
- Initialization: Select k data points as initial medoids
- Assignment: Assign each point to the nearest medoid
- Update: Find the optimal medoid for each cluster (point that minimizes total distance)
- Repeat: Continue until no improvement in objective
Basic Usage
using UnsupervisedClustering, Distances, Random
# Generate sample data
Random.seed!(42);
data = rand(100, 2);
k = 3;
# Compute pairwise distances
distances = pairwise(SqEuclidean(), data, dims = 1);
# Create and run k-medoids
kmedoids = Kmedoids();
result = fit(kmedoids, distances, k);
result.objective
# output
7.221968512868637API Reference
UnsupervisedClustering.BalancedKmedoids — TypeBalancedKmedoids(
verbose::Bool = DEFAULT_VERBOSE
rng::AbstractRNG = Random.GLOBAL_RNG
tolerance::Float64 = DEFAULT_TOLERANCE
max_iterations::Int = DEFAULT_MAX_ITERATIONS
)The balanced k-medoids is a variation of the k-medoids algorithm that balances the number of data points assigned to each cluster. It uses the same parameters as the k-medoids algorithm.
Fields
verbose: controls whether the algorithm should display additional information during execution.rng: represents the random number generator to be used by the algorithm.tolerance: represents the convergence criterion for the algorithm. It determines the maximum change allowed in the centroid positions between consecutive iterations.max_iterations: represents the maximum number of iterations the algorithm will perform before stopping, even if convergence has not been reached.
UnsupervisedClustering.Kmedoids — TypeKmedoids(
verbose::Bool = DEFAULT_VERBOSE
rng::AbstractRNG = Random.GLOBAL_RNG
tolerance::Float64 = DEFAULT_TOLERANCE
max_iterations::Int = DEFAULT_MAX_ITERATIONS
)The k-medoids is a variation of k-means clustering algorithm that uses actual data points (medoids) as representatives of each cluster instead of the mean.
Fields
verbose: controls whether the algorithm should display additional information during execution.rng: represents the random number generator to be used by the algorithm.tolerance: represents the convergence criterion for the algorithm. It determines the maximum change allowed in the centroid positions between consecutive iterations.max_iterations: represents the maximum number of iterations the algorithm will perform before stopping, even if convergence has not been reached.
References
UnsupervisedClustering.KmedoidsResult — TypeKmedoidsResult(
assignments::AbstractVector{<:Integer}
clusters::AbstractVector{<:Integer}
objective::Real
objective_per_cluster::AbstractVector{<:Real}
iterations::Integer
elapsed::Real
converged::Bool
k::Integer
)KmedoidsResult struct represents the result of the k-medoids clustering algorithm.
Fields
assignments: an integer vector that stores the cluster assignment for each data point.clusters: an integer vector representing each cluster's centroid.objective: a floating-point number representing the objective function after running the algorithm. The objective function measures the quality of the clustering solution.objective_per_cluster: a floating-point vector that stores the objective function of each clusteriterations: an integer value indicating the number of iterations performed until the algorithm has converged or reached the maximum number of iterationselapsed: a floating-point number representing the time in seconds for the algorithm to complete.converged: indicates whether the algorithm has converged to a solution.k: the number of clusters.
UnsupervisedClustering.fit! — Methodfit!(
kmedoids::AbstractKmedoids,
distances::AbstractMatrix{<:Real},
result::KmedoidsResult
)The fit! function performs the k-medoids clustering algorithm on the given result as the initial point and updates the provided object with the clustering result.
Parameters:
kmedoids: an instance representing the clustering settings and parameters.distances: a floating-point matrix representing the pairwise distances between the data points.result: a result object that will be updated with the clustering result.
Example
n = 100
d = 2
k = 2
data = rand(n, d)
distances = pairwise(SqEuclidean(), data, dims = 1)
kmedoids = Kmedoids()
result = KmedoidsResult(n, [1.0 2.0; 1.0 2.0])
fit!(kmedoids, distances, result)UnsupervisedClustering.fit — Methodfit(
kmedoids::AbstractKmedoids,
distances::AbstractMatrix{<:Real},
initial_clusters::AbstractVector{<:Integer}
)The fit function performs the k-medoids clustering algorithm on the given data points as the initial point and returns a result object representing the clustering result.
Parameters:
kmedoids: an instance representing the clustering settings and parameters.distances: a floating-point matrix representing the pairwise distances between the data points.initial_clusters: an integer vector where each element is the initial data point for each cluster.
Example
n = 100
d = 2
k = 2
data = rand(n, d)
distances = pairwise(SqEuclidean(), data, dims = 1)
kmedoids = Kmedoids()
result = fit(kmedoids, distances, [4, 12])UnsupervisedClustering.fit — Methodfit(
kmedoids::AbstractKmedoids,
distances::AbstractMatrix{<:Real},
k::Integer
)The fit function performs the k-medoids clustering algorithm and returns a result object representing the clustering result.
Parameters:
kmedoids: an instance representing the clustering settings and parameters.distances: a floating-point matrix representing the pairwise distances between the data points.k: an integer representing the number of clusters.
Example
n = 100
d = 2
k = 2
data = rand(n, d)
distances = pairwise(SqEuclidean(), data, dims = 1)
kmedoids = Kmedoids()
result = fit(kmedoids, distances, k)