Leveraging text semantics for enhanced scene text image super-resolution

In recent years, due to the development of neural networks, super-resolution technology has made unprecedented progress. However, most existing super-resolution methods treat scene text images as normal images, ignoring the text information within them. This paper proposes to incorporate categorical...

Full description

Saved in:
Bibliographic Details
Main Authors: Li Chen, Jinsong Wu, Yicheng Liu
Format: Article
Language:English
Published: Tsinghua University Press 2025-06-01
Series:Intelligent and Converged Networks
Subjects:
Online Access:https://www.sciopen.com/article/10.23919/ICN.2025.0009
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In recent years, due to the development of neural networks, super-resolution technology has made unprecedented progress. However, most existing super-resolution methods treat scene text images as normal images, ignoring the text information within them. This paper proposes to incorporate categorical priors specific to the text in scene text image super-resolution model training process, called two-stage text prior super-resolution (TTPSR) framework. The TTPSR framework first employs a parallel context attention network to restore the low-resolution image without incorporating text priors. Then, the obtained image is used for text recognition to obtain the text prior. The attention mechanism is then used to fuse the text prior and image features to guide the generation of the final high-resolution text image. Experiments have shown that the TTPSR model outperforms relevant state-of-the-art models in relevant metrics, peak signal-to-noise ratio, structural similarity index, and text accuracy, on the Textzoom dataset.
ISSN:2708-6240