Secure K-Means Clustering Scheme for Confidential Data Based on Paillier Cryptosystem

In this paper, we propose a secure homomorphic K-means clustering protocol based on the Paillier cryptosystem to address the urgent need for privacy-preserving clustering techniques in sensitive domains such as healthcare and finance. The protocol uses the additive homomorphism property of the Paill...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhengqi Zhang, Zixin Xiong, Jun Ye
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/12/6918
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this paper, we propose a secure homomorphic K-means clustering protocol based on the Paillier cryptosystem to address the urgent need for privacy-preserving clustering techniques in sensitive domains such as healthcare and finance. The protocol uses the additive homomorphism property of the Paillier cryptosystem to perform K-means clustering on the encrypted data, which ensures the confidentiality of the data during the whole calculation process. The protocol consists of three main components: secure computation distance (SCD) protocol, secure cluster assignment (SCA) protocol and secure cluster center update (SUCC) protocol. The SCD protocol securely computes the squared Euclidean distance between the encrypted data point and the encrypted cluster center. The SCA protocol securely assigns data points to clusters based on these cryptographic distances. Finally, the SUCC protocol securely updates the cluster centers without leaking the actual data points as well as the number of intermediate sums. Through security analysis and experimental verification, the effectiveness and practicability of the protocol are proved. This work provides a practical solution for secure clustering based on homomorphic encryption and contributes to the research in the field of privacy-preserving data mining. Although this protocol solves the key problems of secure distance computation, cluster assignment and centroid update, there are still areas for further research. These include optimizing the computational efficiency of the protocol, exploring other homomorphic encryption schemes that may provide better performance, and extending the protocol to handle more complex clustering algorithms.
ISSN:2076-3417