Scalable and Efficient Protein Secondary Structure Prediction Using Autoencoder-Reduced ProtBERT Embeddings
This study proposes a deep learning framework for Protein Secondary Structure Prediction (PSSP) that prioritizes computational efficiency while preserving classification accuracy. Leveraging ProtBERT-derived embeddings, we apply autoencoder-based dimensionality reduction to compress high-dimensional...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-06-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/15/13/7112 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This study proposes a deep learning framework for Protein Secondary Structure Prediction (PSSP) that prioritizes computational efficiency while preserving classification accuracy. Leveraging ProtBERT-derived embeddings, we apply autoencoder-based dimensionality reduction to compress high-dimensional sequence representations. These are segmented into fixed-length subsequences, enabling efficient input formatting for a Bi-LSTM-based classifier. Our experiments, conducted on a curated PISCES-based dataset, reveal that reducing input dimensions from 1024 to 256 preserves over 99% of predictive performance (Q3 F1 score: 0.8049 → 0.8023) while reducing GPU memory usage by 67% and training time by 43%. Moreover, subsequence lengths of 50 residues provide an optimal trade-off between contextual learning and training stability. Compared to baseline configurations, the proposed framework reduces training overhead substantially without compromising structural accuracy in both the Q3 and Q8 classification schemes. These findings offer a practical pathway for scalable protein structure prediction, particularly in resource-constrained environments. |
---|---|
ISSN: | 2076-3417 |