Detection of LUAD-Associated Genes Using Wasserstein Distance in Multiomics Feature Selection
Lung adenocarcinoma (LUAD) is characterized by substantial genetic heterogeneity, making it challenging to identify reliable biomarkers for diagnosis and treatment. Tumor mutational burden (TMB) is widely recognized as a predictive biomarker due to its association with immune response and treatment...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-06-01
|
Series: | Bioengineering |
Subjects: | |
Online Access: | https://www.mdpi.com/2306-5354/12/7/694 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Lung adenocarcinoma (LUAD) is characterized by substantial genetic heterogeneity, making it challenging to identify reliable biomarkers for diagnosis and treatment. Tumor mutational burden (TMB) is widely recognized as a predictive biomarker due to its association with immune response and treatment efficacy. In this study, we take a different approach by treating TMB as a response variable to uncover its genetic drivers using multiomics data. We conducted a thorough evaluation of recent feature selection methods through extensive simulations and identified three top-performing approaches: projection correlation screening (PC-Screen), distance correlation sure independence screening (DC-SIS), and Wasserstein distance-based screening (WD-Screen). Unlike traditional approaches that rely on simple statistical tests or dataset splitting for validation, we adopt a method-based validation strategy, selecting top-ranked features from each method and identifying consistently selected genes across all three. Using The Cancer Genome Atlas (TCGA) dataset, we integrated copy number alteration (CNA), mRNA expression, and DNA methylation data as predictors and applied our selected methods. In the two-platform analysis (mRNA + CNA), we identified 13 key genes, including both previously reported LUAD-associated genes (<i>CCNG1, CKAP2L, HSD17B4, SHROOM1, TIGD6</i>, and <i>TMEM173</i>) and novel candidates (<i>DTWD2, FLJ33630, NME5, NUDT12, PCBD2, REEP5</i>, and <i>SLC22A5</i>). Expanding to a three-platform analysis (mRNA + CNA + methylation) further refined our findings, with <i>PCBD2</i> and <i>TMEM173</i> emerging as the robust candidates. These results highlight the complexity of multiomics integration and the need for advanced feature selection techniques to uncover biologically meaningful patterns. Our multiomics strategy and robust selection approach provide insights into the genetic determinants of TMB, offering potential biomarkers for targeted LUAD therapies and demonstrating the power of Wasserstein distance-based feature selection in complex genomic analysis. |
---|---|
ISSN: | 2306-5354 |