PCMINN: A GPU-Accelerated Conditional Mutual Information-Based Feature Selection Method

In feature selection, it is crucial to identify features that are not only relevant to the target variable but also non-redundant. Conditional Mutual Information Nearest-Neighbor (CMINN) is an algorithm developed to address this challenge by using Conditional Mutual Information (CMI) to assess the r...

Full description

Saved in:
Bibliographic Details
Main Authors: Nikolaos Papaioannou, Georgios Myllis, Alkiviadis Tsimpiris, Stamatis Aggelopoulos, Vasiliki Vrana
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/16/6/445
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1839653788257878016
author Nikolaos Papaioannou
Georgios Myllis
Alkiviadis Tsimpiris
Stamatis Aggelopoulos
Vasiliki Vrana
author_facet Nikolaos Papaioannou
Georgios Myllis
Alkiviadis Tsimpiris
Stamatis Aggelopoulos
Vasiliki Vrana
author_sort Nikolaos Papaioannou
collection DOAJ
description In feature selection, it is crucial to identify features that are not only relevant to the target variable but also non-redundant. Conditional Mutual Information Nearest-Neighbor (CMINN) is an algorithm developed to address this challenge by using Conditional Mutual Information (CMI) to assess the relevance of individual features to the target variable, while identifying redundancy among similar features. Although effective, the original CMINN algorithm can be computationally intensive, particularly with large and high-dimensional datasets. In this study, we extend the CMINN algorithm by parallelizing it for execution on Graphics Processing Units (GPUs), significantly enhancing its efficiency and scalability for high-dimensional datasets. The parallelized CMINN (PCMINN) leverages the massive parallelism of modern GPUs to handle the computational complexity inherent in sequential feature selection, particularly when dealing with large-scale data. To evaluate the performance of PCMINN across various scenarios, we conduct both an extensive simulation study using datasets with combined feature effects and a case study using financial data. Our results show that PCMINN not only maintains the effectiveness of the original CMINN in selecting the optimal feature subset, but also achieves faster execution times. The parallelized approach allows for the efficient processing of large datasets, making PCMINN a valuable tool for high-dimensional feature selection tasks. We also provide a package that includes two Python implementations to support integration into future research workflows: a sequential version of CMINN and a parallel GPU-based version of PCMINN.
format Article
id doaj-art-eb98e0ebb31a43faa6a4a9d7c15a6e3a
institution Matheson Library
issn 2078-2489
language English
publishDate 2025-05-01
publisher MDPI AG
record_format Article
series Information
spelling doaj-art-eb98e0ebb31a43faa6a4a9d7c15a6e3a2025-06-25T13:57:31ZengMDPI AGInformation2078-24892025-05-0116644510.3390/info16060445PCMINN: A GPU-Accelerated Conditional Mutual Information-Based Feature Selection MethodNikolaos Papaioannou0Georgios Myllis1Alkiviadis Tsimpiris2Stamatis Aggelopoulos3Vasiliki Vrana4Department of Computer Informatics and Telecommunications Engineering, International Hellenic University, 621 24 Serres, GreeceDepartment of Computer Informatics and Telecommunications Engineering, International Hellenic University, 621 24 Serres, GreeceDepartment of Computer Informatics and Telecommunications Engineering, International Hellenic University, 621 24 Serres, GreeceDepartment of Agriculture, International Hellenic University, 570 01 Thermi, GreeceDepartment of Business Administration, International Hellenic University, 621 24 Serres, GreeceIn feature selection, it is crucial to identify features that are not only relevant to the target variable but also non-redundant. Conditional Mutual Information Nearest-Neighbor (CMINN) is an algorithm developed to address this challenge by using Conditional Mutual Information (CMI) to assess the relevance of individual features to the target variable, while identifying redundancy among similar features. Although effective, the original CMINN algorithm can be computationally intensive, particularly with large and high-dimensional datasets. In this study, we extend the CMINN algorithm by parallelizing it for execution on Graphics Processing Units (GPUs), significantly enhancing its efficiency and scalability for high-dimensional datasets. The parallelized CMINN (PCMINN) leverages the massive parallelism of modern GPUs to handle the computational complexity inherent in sequential feature selection, particularly when dealing with large-scale data. To evaluate the performance of PCMINN across various scenarios, we conduct both an extensive simulation study using datasets with combined feature effects and a case study using financial data. Our results show that PCMINN not only maintains the effectiveness of the original CMINN in selecting the optimal feature subset, but also achieves faster execution times. The parallelized approach allows for the efficient processing of large datasets, making PCMINN a valuable tool for high-dimensional feature selection tasks. We also provide a package that includes two Python implementations to support integration into future research workflows: a sequential version of CMINN and a parallel GPU-based version of PCMINN.https://www.mdpi.com/2078-2489/16/6/445feature selectionconditional mutual informationnearest-neighbor estimateparallelGPU
spellingShingle Nikolaos Papaioannou
Georgios Myllis
Alkiviadis Tsimpiris
Stamatis Aggelopoulos
Vasiliki Vrana
PCMINN: A GPU-Accelerated Conditional Mutual Information-Based Feature Selection Method
Information
feature selection
conditional mutual information
nearest-neighbor estimate
parallel
GPU
title PCMINN: A GPU-Accelerated Conditional Mutual Information-Based Feature Selection Method
title_full PCMINN: A GPU-Accelerated Conditional Mutual Information-Based Feature Selection Method
title_fullStr PCMINN: A GPU-Accelerated Conditional Mutual Information-Based Feature Selection Method
title_full_unstemmed PCMINN: A GPU-Accelerated Conditional Mutual Information-Based Feature Selection Method
title_short PCMINN: A GPU-Accelerated Conditional Mutual Information-Based Feature Selection Method
title_sort pcminn a gpu accelerated conditional mutual information based feature selection method
topic feature selection
conditional mutual information
nearest-neighbor estimate
parallel
GPU
url https://www.mdpi.com/2078-2489/16/6/445
work_keys_str_mv AT nikolaospapaioannou pcminnagpuacceleratedconditionalmutualinformationbasedfeatureselectionmethod
AT georgiosmyllis pcminnagpuacceleratedconditionalmutualinformationbasedfeatureselectionmethod
AT alkiviadistsimpiris pcminnagpuacceleratedconditionalmutualinformationbasedfeatureselectionmethod
AT stamatisaggelopoulos pcminnagpuacceleratedconditionalmutualinformationbasedfeatureselectionmethod
AT vasilikivrana pcminnagpuacceleratedconditionalmutualinformationbasedfeatureselectionmethod