Sciact
  • EN
  • RU

Parallel Clustering Algorithm for the k-medoids Problem in High-dimensional Space for Large-scale Datasets Full article

Conference Optimization Problems of Complex Systems : International Asian School-Seminar
14-22 Aug 2023 , Новосибирск
Source 2023 19th International Asian School-Seminar on Optimization Problems of Complex Systems (OPCS)
Compilation, IEEE. 2023. 6 c. ISBN 9798350331134.
Output data Year: 2023, Pages: 119-124 Pages count : 6 DOI: 10.1109/opcs59592.2023.10275752
Tags k-medoids, clustering, facility location, Lagrangian relaxation, machine learning, p-median
Authors Vandanov Sergey 1 , Plyasunov Aleksandr V. 2 , Ushakov Anton V. 3
Affiliations
1 Department of Physics, Novosibirsk State University, Novosibirsk, Russia
2 Lab. of Mathematical Models of Decision Making, Sobolev Institute of Mathematics of SB RAS, Novosibirsk, Russia
3 Matrosov Institute for System Dynamics and Control Theory of SB RAS, Irkutsk, Russia

Funding (1)

1 Sobolev Institute of Mathematics FWNF-2022-0019

Abstract: We present a robust, parallel primal-dual heuristic algorithm for the k-medoids clustering problem, a widely utilized method in data mining and machine learning. Our approach surpasses current algorithms by effectively addressing their limitations, such as time-consuming distance matrix calculations, inefficient nearest-neighbor searches, and difficulties in handling large-scale datasets. To overcome these challenges, we employ an efficient parallel implementation, combined with a pioneering subgradient search algorithm and a parallel column generation method. We evaluate our algorithm on the BIRCH and Stanford Dog datasets and demonstrate its superiority over existing kmedoids clustering algorithms in terms of solution quality and run time. Additionally, we introduce a novel vectorization technique that enables our algorithm to handle various types of data, such as images, text, and point data. Overall, our work contributes to the field of data mining and machine learning by providing an efficient and effective solution for the k-medoids clustering problem. The proposed algorithm offers improved performance, scalability, and versatility, making it a valuable tool for a wide range of applications.
Cite: Vandanov S. , Plyasunov A.V. , Ushakov A.V.
Parallel Clustering Algorithm for the k-medoids Problem in High-dimensional Space for Large-scale Datasets
In compilation 2023 19th International Asian School-Seminar on Optimization Problems of Complex Systems (OPCS). – IEEE., 2023. – C.119-124. – ISBN 9798350331134. DOI: 10.1109/opcs59592.2023.10275752 Scopus OpenAlex
Dates:
Published print: Oct 13, 2023
Published online: Oct 13, 2023
Identifiers:
Scopus: 2-s2.0-85175470838
OpenAlex: W4387620806
Citing:
DB Citing
OpenAlex 1
Scopus 1
Altmetrics: