DPC-SMOTE Over-sampling Algorithm for Imbalanced Data Classification
An oversampling algorithm based on density peak clustering is proposed to solve the problem of noise and imbalance among classes in imbalanced data sets. Firstly, most of the samples are preprocessed, and the noise samples are screened and deleted. Secondly , the algorithm adopts density peak cluste...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | Chinese |
Published: |
Harbin University of Science and Technology Publications
2024-12-01
|
Series: | Journal of Harbin University of Science and Technology |
Subjects: | |
Online Access: | https://hlgxb.hrbust.edu.cn/#/digest?ArticleID=2382 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | An oversampling algorithm based on density peak clustering is proposed to solve the problem of noise and imbalance among classes in imbalanced data sets. Firstly, most of the samples are preprocessed, and the noise samples are screened and deleted. Secondly , the algorithm adopts density peak clustering for all minority samples and removes noise points. Then the sampling weights are assigned according to the different sparsity of each cluster, and the number of new samples to be synthesized for each cluster is calculated. SMOTE oversampling is performed in each cluster to synthesize new samples. The proposed oversampling algorithm is compared with five common oversampling algorithms. It is combined with five base classifiers respectively, and comparison experiments are carried out on six imbalanced data sets. The experimental results show that F1 , G-mean and AUC of this method can increase by 1. 21% , 0. 94% and 5. 14% at least. The maximum increase can be 15. 90% , 14. 99% , 11. 26% . It is proved that this method can reduce sample overlap, effectively avoid noise generation in imbalanced data sets, and improve classification accuracy. |
---|---|
ISSN: | 1007-2683 |