Outlier detection method based on K-means

In industry, electric power, transportation and other fields, anomalies are often the precursors of problems or failures in the system. Through anomaly identification techniques, system abnormal behavior can be detected in time to prevent or quickly respond to potential failures and improve system r...

Full description

Saved in:
Bibliographic Details
Main Authors: Liu Daojun, Liu Shuai, Zhang Yusong, Ou Sicheng
Format: Article
Language:Chinese
Published: National Computer System Engineering Research Institute of China 2025-05-01
Series:Dianzi Jishu Yingyong
Subjects:
Online Access:http://www.chinaaet.com/article/3000171650
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In industry, electric power, transportation and other fields, anomalies are often the precursors of problems or failures in the system. Through anomaly identification techniques, system abnormal behavior can be detected in time to prevent or quickly respond to potential failures and improve system reliability and stability. Current anomaly identification algorithms usually need to introduce expert information (e.g., suitable parameter values), but in many identification scenarios, the data distribution as well as the cause of anomaly occurrence are unknown, resulting in unreliable expert information. Therefore, it is significant to design an anomaly identification algorithm that does not require the intervention of expert information. In this paper, an adaptive anomaly identification algorithm is designed. Specifically, it identifies numerous small clusters by K-means, and then counts the distribution probability of the number of objects in each cluster to generate a probability distribution graph. From the probability distribution graph, it can be clearly observed which clusters contain significantly smaller numbers of objects than other clusters, and thus they are recognized as anomalous clusters in which the objects are recognized as anomalies. In other words, the probability distribution graph replaces expert information and assists the user in identifying valid anomalies when the distribution as well as the cause is unknown.
ISSN:0258-7998