An Improved Backbone Fusion Neural Network for Orchard Extraction

The monitoring and extraction of orchard planting information is of significant importance. Semantic segmentation deep learning models utilizing convolutional neural networks (CNNs) or vision transformers have become the cornerstone for such tasks. However, different backbone networks exhibit varyin...

Full description

Saved in:
Bibliographic Details
Main Authors: Baiyu Dong, Ziqi Wang, Chongzhi Chen, Ke Wang, Jing Zhang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11072220/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The monitoring and extraction of orchard planting information is of significant importance. Semantic segmentation deep learning models utilizing convolutional neural networks (CNNs) or vision transformers have become the cornerstone for such tasks. However, different backbone networks exhibit varying capabilities and characteristics in feature extraction, limiting the performance of a single backbone model. Traditional fusion backbone networks lack strategies for dynamically adjusting weights, thus failing to fully leverage the strengths of various backbone networks. Therefore, this study proposed a novel backbone fusion strategy (FSC) by a dual information-interactive fusion architecture and an attention gate mechanism. Based on the FSC strategy, we integrated two powerful backbone, ResNet50 and swin transformer, to develop a backbone fusion network (FSRNet) for orchard extraction. We further incorporated skip-connection attention block, optimized the loss function, and improved the dataset by adding more spectral and texture information to enhance the model performance. The results indicated that the FSRNet performs excellently in orchard extraction, with overall classification accuracy and F1 score both exceeding classical or state-of-the-art models. In addition, the FSC strategy yields the best results among different backbone fusing strategies, with an F1 score of 0.864 and an overall accuracy (OA) of 0.897. Compared to single backbone networks, the F1 score and OA obtained based on the FSR strategy have increased by an average of 8.5% and 4.2%, respectively. When compared to traditional backbone fusion strategies, they have risen by an average of 4.5% and 2.3%, respectively. This fully demonstrates the effectiveness and robustness of the proposed fusion strategy. The FSRNet combines the local feature extraction capability of ResNet50 with the global feature extraction ability of the swin transformer, showcasing the immense potential of fused backbone models in semantic segmentation tasks.
ISSN:1939-1404
2151-1535