Scene Text Detection Based on Multi-scale Feature Extraction and Bidirectional Feature Fusion
Natural scene text detection is a fundamental research work in the field of image processing and has a wide range of applications. Currently, natural scene text detection usually adopts single-scale convolution and multi-scale feature fusion to capture the semantic features of scene text. Howeve...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | Chinese |
Published: |
Harbin University of Science and Technology Publications
2024-08-01
|
Series: | Journal of Harbin University of Science and Technology |
Subjects: | |
Online Access: | https://hlgxb.hrbust.edu.cn/#/digest?ArticleID=2346 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1839636364125011968 |
---|---|
author | LIAN Zhe YIN Yanjun ZHI Min XU Qiaozhi |
author_facet | LIAN Zhe YIN Yanjun ZHI Min XU Qiaozhi |
author_sort | LIAN Zhe |
collection | DOAJ |
description |
Natural scene text detection is a fundamental research work in the field of image processing and has a wide range of applications. Currently, natural scene text detection usually adopts single-scale convolution and multi-scale feature fusion to capture the semantic features of scene text. However, single-scale convolution methods are usually difficult to take into account the feature representation of text targets with different shapes and scales. Meanwhile, simple multi-scale feature fusion methods based on upsampling only focus on the consistency of scale size, while ignoring the importance of features at different scales. To address the above problems, a scene text detection algorithm based on multi-scale feature extraction and bidirectional feature fusion is proposed. The proposed algorithm constructs a multi-scale feature extraction module based on convolutional kernels of different sizes to take into account the feature extraction of text targets of different scales and shapes, while capturing contextual information dependencies at different distances. In the feature fusion process, a bi-directional feature fusion module is constructed by adding bottom-up fusion paths to achieve different scales of information interaction. Coordinate attention is introduced after feature fusion to achieve high-level detail information enhancement and compensate for the deficiency of feature fusion detail information loss. Extensive experiments are conducted on the ICDAR2015 , MSRA-TD500 , and CTW1500 datasets, and the experimental F1 values reach 87. 8% , 87. 1% , and 83. 2% , respectively, with detection speeds of 17. 2 frames/s, 31. 1 frames/s, and 22. 3 frames/s, respectively, showing good robustness compared with other advanced detection methods. |
format | Article |
id | doaj-art-a8d851cf41f84c55b2268ec7a9f1148e |
institution | Matheson Library |
issn | 1007-2683 |
language | zho |
publishDate | 2024-08-01 |
publisher | Harbin University of Science and Technology Publications |
record_format | Article |
series | Journal of Harbin University of Science and Technology |
spelling | doaj-art-a8d851cf41f84c55b2268ec7a9f1148e2025-07-08T01:39:39ZzhoHarbin University of Science and Technology PublicationsJournal of Harbin University of Science and Technology1007-26832024-08-012904293910.15938/j.jhust.2024.04.004Scene Text Detection Based on Multi-scale Feature Extraction and Bidirectional Feature FusionLIAN Zhe0YIN Yanjun1ZHI Min2XU Qiaozhi3College of Computer Science and Technology, Inner Mongolia Normal University, Hohhot 010022 , ChinaCollege of Computer Science and Technology, Inner Mongolia Normal University, Hohhot 010022 , ChinaCollege of Computer Science and Technology, Inner Mongolia Normal University, Hohhot 010022 , ChinaCollege of Computer Science and Technology, Inner Mongolia Normal University, Hohhot 010022 , China Natural scene text detection is a fundamental research work in the field of image processing and has a wide range of applications. Currently, natural scene text detection usually adopts single-scale convolution and multi-scale feature fusion to capture the semantic features of scene text. However, single-scale convolution methods are usually difficult to take into account the feature representation of text targets with different shapes and scales. Meanwhile, simple multi-scale feature fusion methods based on upsampling only focus on the consistency of scale size, while ignoring the importance of features at different scales. To address the above problems, a scene text detection algorithm based on multi-scale feature extraction and bidirectional feature fusion is proposed. The proposed algorithm constructs a multi-scale feature extraction module based on convolutional kernels of different sizes to take into account the feature extraction of text targets of different scales and shapes, while capturing contextual information dependencies at different distances. In the feature fusion process, a bi-directional feature fusion module is constructed by adding bottom-up fusion paths to achieve different scales of information interaction. Coordinate attention is introduced after feature fusion to achieve high-level detail information enhancement and compensate for the deficiency of feature fusion detail information loss. Extensive experiments are conducted on the ICDAR2015 , MSRA-TD500 , and CTW1500 datasets, and the experimental F1 values reach 87. 8% , 87. 1% , and 83. 2% , respectively, with detection speeds of 17. 2 frames/s, 31. 1 frames/s, and 22. 3 frames/s, respectively, showing good robustness compared with other advanced detection methods.https://hlgxb.hrbust.edu.cn/#/digest?ArticleID=2346ext detectionmulti-scale feature extractionbidirectional feature fusioncoordinate attentiondifferentiable binariza- tion |
spellingShingle | LIAN Zhe YIN Yanjun ZHI Min XU Qiaozhi Scene Text Detection Based on Multi-scale Feature Extraction and Bidirectional Feature Fusion Journal of Harbin University of Science and Technology ext detection multi-scale feature extraction bidirectional feature fusion coordinate attention differentiable binariza- tion |
title | Scene Text Detection Based on Multi-scale Feature Extraction and Bidirectional Feature Fusion |
title_full | Scene Text Detection Based on Multi-scale Feature Extraction and Bidirectional Feature Fusion |
title_fullStr | Scene Text Detection Based on Multi-scale Feature Extraction and Bidirectional Feature Fusion |
title_full_unstemmed | Scene Text Detection Based on Multi-scale Feature Extraction and Bidirectional Feature Fusion |
title_short | Scene Text Detection Based on Multi-scale Feature Extraction and Bidirectional Feature Fusion |
title_sort | scene text detection based on multi scale feature extraction and bidirectional feature fusion |
topic | ext detection multi-scale feature extraction bidirectional feature fusion coordinate attention differentiable binariza- tion |
url | https://hlgxb.hrbust.edu.cn/#/digest?ArticleID=2346 |
work_keys_str_mv | AT lianzhe scenetextdetectionbasedonmultiscalefeatureextractionandbidirectionalfeaturefusion AT yinyanjun scenetextdetectionbasedonmultiscalefeatureextractionandbidirectionalfeaturefusion AT zhimin scenetextdetectionbasedonmultiscalefeatureextractionandbidirectionalfeaturefusion AT xuqiaozhi scenetextdetectionbasedonmultiscalefeatureextractionandbidirectionalfeaturefusion |