Unraveling Meteorological Dynamics: A Two-Level Clustering Algorithm for Time Series Pattern Recognition with Missing Data Handling

Identifying regions with similar meteorological features is of both socioeconomic and ecological importance. Towards that direction, useful information can be drawn from meteorological stations, and spread in a broader area. In this work, a time series clustering procedure composed of two levels is...

Full description

Saved in:
Bibliographic Details
Main Authors: Ekaterini Skamnia, Eleni S. Bekri, Polychronis Economou
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Stats
Subjects:
Online Access:https://www.mdpi.com/2571-905X/8/2/36
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Identifying regions with similar meteorological features is of both socioeconomic and ecological importance. Towards that direction, useful information can be drawn from meteorological stations, and spread in a broader area. In this work, a time series clustering procedure composed of two levels is proposed, focusing on clustering spatial units (meteorological stations) based on their temporal patterns, rather than clustering time periods. It is capable of handling univariate or multivariate time series, with missing data or different lengths but with a common seasonal time period. The first level involves the clustering of the dominant features of the time series (e.g., similar seasonal patterns) by employing K-means, while the second one produces clusters based on secondary features. Hierarchical clustering with Dynamic Time Warping for the univariate case and multivariate Dynamic Time Warping for the multivariate scenario are employed for the second level. Principal component analysis or Classic Multidimensional Scaling is applied before the first level, while an imputation technique is applied to the raw data in the second level to address missing values in the dataset. This step is particularly important given that missing data is a frequent issue in measurements obtained from meteorological stations. The method is subsequently applied to the available precipitation time series and then also to a time series of mean temperature obtained by the automated weather stations network in Greece. Further, both of the characteristics are employed to cover the multivariate scenario.
ISSN:2571-905X