PCMINN: A GPU-Accelerated Conditional Mutual Information-Based Feature Selection Method
In feature selection, it is crucial to identify features that are not only relevant to the target variable but also non-redundant. Conditional Mutual Information Nearest-Neighbor (CMINN) is an algorithm developed to address this challenge by using Conditional Mutual Information (CMI) to assess the r...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-05-01
|
Series: | Information |
Subjects: | |
Online Access: | https://www.mdpi.com/2078-2489/16/6/445 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1839653788257878016 |
---|---|
author | Nikolaos Papaioannou Georgios Myllis Alkiviadis Tsimpiris Stamatis Aggelopoulos Vasiliki Vrana |
author_facet | Nikolaos Papaioannou Georgios Myllis Alkiviadis Tsimpiris Stamatis Aggelopoulos Vasiliki Vrana |
author_sort | Nikolaos Papaioannou |
collection | DOAJ |
description | In feature selection, it is crucial to identify features that are not only relevant to the target variable but also non-redundant. Conditional Mutual Information Nearest-Neighbor (CMINN) is an algorithm developed to address this challenge by using Conditional Mutual Information (CMI) to assess the relevance of individual features to the target variable, while identifying redundancy among similar features. Although effective, the original CMINN algorithm can be computationally intensive, particularly with large and high-dimensional datasets. In this study, we extend the CMINN algorithm by parallelizing it for execution on Graphics Processing Units (GPUs), significantly enhancing its efficiency and scalability for high-dimensional datasets. The parallelized CMINN (PCMINN) leverages the massive parallelism of modern GPUs to handle the computational complexity inherent in sequential feature selection, particularly when dealing with large-scale data. To evaluate the performance of PCMINN across various scenarios, we conduct both an extensive simulation study using datasets with combined feature effects and a case study using financial data. Our results show that PCMINN not only maintains the effectiveness of the original CMINN in selecting the optimal feature subset, but also achieves faster execution times. The parallelized approach allows for the efficient processing of large datasets, making PCMINN a valuable tool for high-dimensional feature selection tasks. We also provide a package that includes two Python implementations to support integration into future research workflows: a sequential version of CMINN and a parallel GPU-based version of PCMINN. |
format | Article |
id | doaj-art-eb98e0ebb31a43faa6a4a9d7c15a6e3a |
institution | Matheson Library |
issn | 2078-2489 |
language | English |
publishDate | 2025-05-01 |
publisher | MDPI AG |
record_format | Article |
series | Information |
spelling | doaj-art-eb98e0ebb31a43faa6a4a9d7c15a6e3a2025-06-25T13:57:31ZengMDPI AGInformation2078-24892025-05-0116644510.3390/info16060445PCMINN: A GPU-Accelerated Conditional Mutual Information-Based Feature Selection MethodNikolaos Papaioannou0Georgios Myllis1Alkiviadis Tsimpiris2Stamatis Aggelopoulos3Vasiliki Vrana4Department of Computer Informatics and Telecommunications Engineering, International Hellenic University, 621 24 Serres, GreeceDepartment of Computer Informatics and Telecommunications Engineering, International Hellenic University, 621 24 Serres, GreeceDepartment of Computer Informatics and Telecommunications Engineering, International Hellenic University, 621 24 Serres, GreeceDepartment of Agriculture, International Hellenic University, 570 01 Thermi, GreeceDepartment of Business Administration, International Hellenic University, 621 24 Serres, GreeceIn feature selection, it is crucial to identify features that are not only relevant to the target variable but also non-redundant. Conditional Mutual Information Nearest-Neighbor (CMINN) is an algorithm developed to address this challenge by using Conditional Mutual Information (CMI) to assess the relevance of individual features to the target variable, while identifying redundancy among similar features. Although effective, the original CMINN algorithm can be computationally intensive, particularly with large and high-dimensional datasets. In this study, we extend the CMINN algorithm by parallelizing it for execution on Graphics Processing Units (GPUs), significantly enhancing its efficiency and scalability for high-dimensional datasets. The parallelized CMINN (PCMINN) leverages the massive parallelism of modern GPUs to handle the computational complexity inherent in sequential feature selection, particularly when dealing with large-scale data. To evaluate the performance of PCMINN across various scenarios, we conduct both an extensive simulation study using datasets with combined feature effects and a case study using financial data. Our results show that PCMINN not only maintains the effectiveness of the original CMINN in selecting the optimal feature subset, but also achieves faster execution times. The parallelized approach allows for the efficient processing of large datasets, making PCMINN a valuable tool for high-dimensional feature selection tasks. We also provide a package that includes two Python implementations to support integration into future research workflows: a sequential version of CMINN and a parallel GPU-based version of PCMINN.https://www.mdpi.com/2078-2489/16/6/445feature selectionconditional mutual informationnearest-neighbor estimateparallelGPU |
spellingShingle | Nikolaos Papaioannou Georgios Myllis Alkiviadis Tsimpiris Stamatis Aggelopoulos Vasiliki Vrana PCMINN: A GPU-Accelerated Conditional Mutual Information-Based Feature Selection Method Information feature selection conditional mutual information nearest-neighbor estimate parallel GPU |
title | PCMINN: A GPU-Accelerated Conditional Mutual Information-Based Feature Selection Method |
title_full | PCMINN: A GPU-Accelerated Conditional Mutual Information-Based Feature Selection Method |
title_fullStr | PCMINN: A GPU-Accelerated Conditional Mutual Information-Based Feature Selection Method |
title_full_unstemmed | PCMINN: A GPU-Accelerated Conditional Mutual Information-Based Feature Selection Method |
title_short | PCMINN: A GPU-Accelerated Conditional Mutual Information-Based Feature Selection Method |
title_sort | pcminn a gpu accelerated conditional mutual information based feature selection method |
topic | feature selection conditional mutual information nearest-neighbor estimate parallel GPU |
url | https://www.mdpi.com/2078-2489/16/6/445 |
work_keys_str_mv | AT nikolaospapaioannou pcminnagpuacceleratedconditionalmutualinformationbasedfeatureselectionmethod AT georgiosmyllis pcminnagpuacceleratedconditionalmutualinformationbasedfeatureselectionmethod AT alkiviadistsimpiris pcminnagpuacceleratedconditionalmutualinformationbasedfeatureselectionmethod AT stamatisaggelopoulos pcminnagpuacceleratedconditionalmutualinformationbasedfeatureselectionmethod AT vasilikivrana pcminnagpuacceleratedconditionalmutualinformationbasedfeatureselectionmethod |