Outlier Detection and Explanation Method Based on FOLOF Algorithm

Outlier mining constitutes an essential aspect of modern data analytics, focusing on the identification and interpretation of anomalous observations. Conventional density-based local outlier detection methodologies frequently exhibit limitations due to their inherent lack of data preprocessing capab...

Whakaahuatanga katoa

I tiakina i:
Ngā taipitopito rārangi puna kōrero
Ngā kaituhi matua: Lei Bai, Jiasheng Wang, Yu Zhou
Hōputu: Tuhinga
Reo:Ingarihi
I whakaputaina: MDPI AG 2025-05-01
Rangatū:Entropy
Ngā marau:
Urunga tuihono:https://www.mdpi.com/1099-4300/27/6/582
Tags: Tāpirihia he Tūtohu
No Tags, Be the first to tag this record!
Whakaahuatanga
Whakarāpopototanga:Outlier mining constitutes an essential aspect of modern data analytics, focusing on the identification and interpretation of anomalous observations. Conventional density-based local outlier detection methodologies frequently exhibit limitations due to their inherent lack of data preprocessing capabilities, consequently demonstrating degraded performance when applied to novel or heterogeneous datasets. Moreover, the computation of the outlier factor for each sample in these algorithms results in considerably higher computational cost, especially in the case of large datasets. This paper introduces a local outlier detection method named FOLOF (FCM Objective Function-based LOF) through an examination of existing algorithms. The approach starts by applying the elbow rule to determine the optimal number of clusters in the dataset. Subsequently, the FCM objective function is employed to prune the dataset to extract a candidate set of outliers. Finally, a weighted local outlier factor detection algorithm computes the degree of anomaly for each sample in the candidate set. For the analysis, the Golden Section method was used to classify the outliers. The underlying causes of these outliers can be revealed by exploring the anomalous properties of each outlier data point through the outlier factors of each dimension property. This approach has been validated on artificial datasets, the UCI dataset, and an NBA player dataset to demonstrate its effectiveness.
ISSN:1099-4300