Multi-Faceted Adaptive Token Pruning for Efficient Remote Sensing Image Segmentation

Global context information is essential for semantic segmentation of remote sensing (RS) images. Due to their remarkable capability to capture global context information and model long-range dependencies, vision transformers have demonstrated great performance on semantic segmentation. However, the...

Full description

Saved in:
Bibliographic Details
Main Authors: Chuge Zhang, Jian Yao
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/17/14/2508
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Global context information is essential for semantic segmentation of remote sensing (RS) images. Due to their remarkable capability to capture global context information and model long-range dependencies, vision transformers have demonstrated great performance on semantic segmentation. However, the high computational complexity of vision transformers impedes their broad application in resource-constrained environments for RS image segmentation. To address this challenge, we propose multi-faceted adaptive token pruning (MATP) to reduce computational cost while maintaining relatively high accuracy. MATP is designed to prune well-learned tokens which do not have a close relation to other tokens. To quantify these two metrics, MATP employs multi-faceted scores: entropy, to evaluate the learning progression of tokens; and attention weight, to assess token correlations. Specially, MATP utilizes adaptive criteria for each score that are automatically adjusted based on specific input features. A token is pruned only when both criteria are satisfied. Overall, MATP facilitates the utilization of vision transformers in resource-constrained environments. Experiments conducted on three widely used datasets reveal that MATP reduces the computation cost about 67–70% with about 3–6% accuracy degradation, achieving a superior trade-off between accuracy and computational cost compared to the state of the art.
ISSN:2072-4292