DPC-SMOTE Over-sampling Algorithm for Imbalanced Data Classification

An oversampling algorithm based on density peak clustering is proposed to solve the problem of noise and imbalance among classes in imbalanced data sets. Firstly, most of the samples are preprocessed, and the noise samples are screened and deleted. Secondly , the algorithm adopts density peak cluste...

Full description

Saved in:
Bibliographic Details
Main Authors: LIU Zhihan, ZHANG Zhonglin, ZHAO Lei
Format: Article
Language:Chinese
Published: Harbin University of Science and Technology Publications 2024-12-01
Series:Journal of Harbin University of Science and Technology
Subjects:
Online Access:https://hlgxb.hrbust.edu.cn/#/digest?ArticleID=2382
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:An oversampling algorithm based on density peak clustering is proposed to solve the problem of noise and imbalance among classes in imbalanced data sets. Firstly, most of the samples are preprocessed, and the noise samples are screened and deleted. Secondly , the algorithm adopts density peak clustering for all minority samples and removes noise points. Then the sampling weights are assigned according to the different sparsity of each cluster, and the number of new samples to be synthesized for each cluster is calculated. SMOTE oversampling is performed in each cluster to synthesize new samples. The proposed oversampling algorithm is compared with five common oversampling algorithms. It is combined with five base classifiers respectively, and comparison experiments are carried out on six imbalanced data sets. The experimental results show that F1 , G-mean and AUC of this method can increase by 1. 21% , 0. 94% and 5. 14% at least. The maximum increase can be 15. 90% , 14. 99% , 11. 26% . It is proved that this method can reduce sample overlap, effectively avoid noise generation in imbalanced data sets, and improve classification accuracy.
ISSN:1007-2683