Scene Text Detection Based on Multi-scale Feature Extraction and Bidirectional Feature Fusion

Natural scene text detection is a fundamental research work in the field of image processing and has a wide range of applications. Currently, natural scene text detection usually adopts single-scale convolution and multi-scale feature fusion to capture the semantic features of scene text. Howeve...

Full description

Saved in:
Bibliographic Details
Main Authors: LIAN Zhe, YIN Yanjun, ZHI Min, XU Qiaozhi
Format: Article
Language:Chinese
Published: Harbin University of Science and Technology Publications 2024-08-01
Series:Journal of Harbin University of Science and Technology
Subjects:
Online Access:https://hlgxb.hrbust.edu.cn/#/digest?ArticleID=2346
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1839636364125011968
author LIAN Zhe
YIN Yanjun
ZHI Min
XU Qiaozhi
author_facet LIAN Zhe
YIN Yanjun
ZHI Min
XU Qiaozhi
author_sort LIAN Zhe
collection DOAJ
description Natural scene text detection is a fundamental research work in the field of image processing and has a wide range of applications. Currently, natural scene text detection usually adopts single-scale convolution and multi-scale feature fusion to capture the semantic features of scene text. However, single-scale convolution methods are usually difficult to take into account the feature representation of text targets with different shapes and scales. Meanwhile, simple multi-scale feature fusion methods based on upsampling only focus on the consistency of scale size, while ignoring the importance of features at different scales. To address the above problems, a scene text detection algorithm based on multi-scale feature extraction and bidirectional feature fusion is proposed. The proposed algorithm constructs a multi-scale feature extraction module based on convolutional kernels of different sizes to take into account the feature extraction of text targets of different scales and shapes, while capturing contextual information dependencies at different distances. In the feature fusion process, a bi-directional feature fusion module is constructed by adding bottom-up fusion paths to achieve different scales of information interaction. Coordinate attention is introduced after feature fusion to achieve high-level detail information enhancement and compensate for the deficiency of feature fusion detail information loss. Extensive experiments are conducted on the ICDAR2015 , MSRA-TD500 , and CTW1500 datasets, and the experimental F1 values reach 87. 8% , 87. 1% , and 83. 2% , respectively, with detection speeds of 17. 2 frames/s, 31. 1 frames/s, and 22. 3 frames/s, respectively, showing good robustness compared with other advanced detection methods.
format Article
id doaj-art-a8d851cf41f84c55b2268ec7a9f1148e
institution Matheson Library
issn 1007-2683
language zho
publishDate 2024-08-01
publisher Harbin University of Science and Technology Publications
record_format Article
series Journal of Harbin University of Science and Technology
spelling doaj-art-a8d851cf41f84c55b2268ec7a9f1148e2025-07-08T01:39:39ZzhoHarbin University of Science and Technology PublicationsJournal of Harbin University of Science and Technology1007-26832024-08-012904293910.15938/j.jhust.2024.04.004Scene Text Detection Based on Multi-scale Feature Extraction and Bidirectional Feature FusionLIAN Zhe0YIN Yanjun1ZHI Min2XU Qiaozhi3College of Computer Science and Technology, Inner Mongolia Normal University, Hohhot 010022 , ChinaCollege of Computer Science and Technology, Inner Mongolia Normal University, Hohhot 010022 , ChinaCollege of Computer Science and Technology, Inner Mongolia Normal University, Hohhot 010022 , ChinaCollege of Computer Science and Technology, Inner Mongolia Normal University, Hohhot 010022 , China Natural scene text detection is a fundamental research work in the field of image processing and has a wide range of applications. Currently, natural scene text detection usually adopts single-scale convolution and multi-scale feature fusion to capture the semantic features of scene text. However, single-scale convolution methods are usually difficult to take into account the feature representation of text targets with different shapes and scales. Meanwhile, simple multi-scale feature fusion methods based on upsampling only focus on the consistency of scale size, while ignoring the importance of features at different scales. To address the above problems, a scene text detection algorithm based on multi-scale feature extraction and bidirectional feature fusion is proposed. The proposed algorithm constructs a multi-scale feature extraction module based on convolutional kernels of different sizes to take into account the feature extraction of text targets of different scales and shapes, while capturing contextual information dependencies at different distances. In the feature fusion process, a bi-directional feature fusion module is constructed by adding bottom-up fusion paths to achieve different scales of information interaction. Coordinate attention is introduced after feature fusion to achieve high-level detail information enhancement and compensate for the deficiency of feature fusion detail information loss. Extensive experiments are conducted on the ICDAR2015 , MSRA-TD500 , and CTW1500 datasets, and the experimental F1 values reach 87. 8% , 87. 1% , and 83. 2% , respectively, with detection speeds of 17. 2 frames/s, 31. 1 frames/s, and 22. 3 frames/s, respectively, showing good robustness compared with other advanced detection methods.https://hlgxb.hrbust.edu.cn/#/digest?ArticleID=2346ext detectionmulti-scale feature extractionbidirectional feature fusioncoordinate attentiondifferentiable binariza- tion
spellingShingle LIAN Zhe
YIN Yanjun
ZHI Min
XU Qiaozhi
Scene Text Detection Based on Multi-scale Feature Extraction and Bidirectional Feature Fusion
Journal of Harbin University of Science and Technology
ext detection
multi-scale feature extraction
bidirectional feature fusion
coordinate attention
differentiable binariza- tion
title Scene Text Detection Based on Multi-scale Feature Extraction and Bidirectional Feature Fusion
title_full Scene Text Detection Based on Multi-scale Feature Extraction and Bidirectional Feature Fusion
title_fullStr Scene Text Detection Based on Multi-scale Feature Extraction and Bidirectional Feature Fusion
title_full_unstemmed Scene Text Detection Based on Multi-scale Feature Extraction and Bidirectional Feature Fusion
title_short Scene Text Detection Based on Multi-scale Feature Extraction and Bidirectional Feature Fusion
title_sort scene text detection based on multi scale feature extraction and bidirectional feature fusion
topic ext detection
multi-scale feature extraction
bidirectional feature fusion
coordinate attention
differentiable binariza- tion
url https://hlgxb.hrbust.edu.cn/#/digest?ArticleID=2346
work_keys_str_mv AT lianzhe scenetextdetectionbasedonmultiscalefeatureextractionandbidirectionalfeaturefusion
AT yinyanjun scenetextdetectionbasedonmultiscalefeatureextractionandbidirectionalfeaturefusion
AT zhimin scenetextdetectionbasedonmultiscalefeatureextractionandbidirectionalfeaturefusion
AT xuqiaozhi scenetextdetectionbasedonmultiscalefeatureextractionandbidirectionalfeaturefusion