k-medoids

UnsupervisedClustering.KmedoidsType
Kmedoids(
    verbose::Bool = DEFAULT_VERBOSE
    rng::AbstractRNG = Random.GLOBAL_RNG
    tolerance::Real = DEFAULT_TOLERANCE
    max_iterations::Integer = DEFAULT_MAX_ITERATIONS
)

The k-medoids is a variation of k-means clustering algorithm that uses actual data points (medoids) as representatives of each cluster instead of the mean.

Fields

  • verbose: controls whether the algorithm should display additional information during execution.
  • rng: represents the random number generator to be used by the algorithm.
  • tolerance: represents the convergence criterion for the algorithm. It determines the maximum change allowed in the centroid positions between consecutive iterations.
  • max_iterations: represents the maximum number of iterations the algorithm will perform before stopping, even if convergence has not been reached.

References

source
UnsupervisedClustering.KmedoidsResultType
KmedoidsResult(
    assignments::AbstractVector{<:Integer}
    clusters::AbstractVector{<:Integer}
    objective::Real
    objective_per_cluster::AbstractVector{<:Real}
    iterations::Integer
    elapsed::Real
    converged::Bool
    k::Integer
)

KmedoidsResult struct represents the result of the k-medoids clustering algorithm.

Fields

  • assignments: an integer vector that stores the cluster assignment for each data point.
  • clusters: an integer vector representing each cluster's centroid.
  • objective: a floating-point number representing the objective function after running the algorithm. The objective function measures the quality of the clustering solution.
  • objective_per_cluster: a floating-point vector that stores the objective function of each cluster
  • iterations: an integer value indicating the number of iterations performed until the algorithm has converged or reached the maximum number of iterations
  • elapsed: a floating-point number representing the time in seconds for the algorithm to complete.
  • converged: indicates whether the algorithm has converged to a solution.
  • k: the number of clusters.
source
UnsupervisedClustering.fit!Method
fit!(
    kmedoids::Kmedoids,
    distances::AbstractMatrix{<:Real},
    result::KmedoidsResult
)

The fit! function performs the k-medoids clustering algorithm on the given result as the initial point and updates the provided object with the clustering result.

Parameters:

  • kmedoids: an instance representing the clustering settings and parameters.
  • distances: a floating-point matrix representing the pairwise distances between the data points.
  • result: a result object that will be updated with the clustering result.

Example

n = 100
d = 2
k = 2

data = rand(n, d)
distances = pairwise(SqEuclidean(), data, dims = 1)

kmedoids = Kmedoids()
result = KmedoidsResult(n, [1.0 2.0; 1.0 2.0])
fit!(kmedoids, distances, result)
source
UnsupervisedClustering.fitMethod
fit(
    kmedoids::Kmedoids,
    distances::AbstractMatrix{<:Real},
    initial_clusters::AbstractVector{<:Integer}
)

The fit function performs the k-medoids clustering algorithm on the given data points as the initial point and returns a result object representing the clustering result.

Parameters:

  • kmedoids: an instance representing the clustering settings and parameters.
  • distances: a floating-point matrix representing the pairwise distances between the data points.
  • initial_clusters: an integer vector where each element is the initial data point for each cluster.

Example

n = 100
d = 2
k = 2

data = rand(n, d)
distances = pairwise(SqEuclidean(), data, dims = 1)

kmedoids = Kmedoids()
result = fit(kmedoids, distances, [4, 12])
source
UnsupervisedClustering.fitMethod
fit(
    kmedoids::Kmedoids,
    distances::AbstractMatrix{<:Real},
    k::Integer
)

The fit function performs the k-medoids clustering algorithm and returns a result object representing the clustering result.

Parameters:

  • kmedoids: an instance representing the clustering settings and parameters.
  • distances: a floating-point matrix representing the pairwise distances between the data points.
  • k: an integer representing the number of clusters.

Example

n = 100
d = 2
k = 2

data = rand(n, d)
distances = pairwise(SqEuclidean(), data, dims = 1)

kmedoids = Kmedoids()
result = fit(kmedoids, distances, k)
source