Enhancing Real-Time Aerial Image Object Detection with High-Frequency Feature Learning and Context-Aware Fusion

Aerial image object detection faces significant challenges due to notable scale variations, numerous small objects, complex backgrounds, illumination variability, motion blur, and densely overlapping objects, placing stringent demands on both accuracy and real-time performance. Although Transformer-...

Cur síos iomlán

Sábháilte in:
Sonraí bibleagrafaíochta
Príomhchruthaitheoirí: Xin Ge, Liping Qi, Qingsen Yan, Jinqiu Sun, Yu Zhu, Yanning Zhang
Formáid: Alt
Teanga:Béarla
Foilsithe / Cruthaithe: MDPI AG 2025-06-01
Sraith:Remote Sensing
Ábhair:
Rochtain ar líne:https://www.mdpi.com/2072-4292/17/12/1994
Clibeanna: Cuir clib leis
Níl clibeanna ann, Bí ar an gcéad duine le clib a chur leis an taifead seo!
Cur síos
Achoimre:Aerial image object detection faces significant challenges due to notable scale variations, numerous small objects, complex backgrounds, illumination variability, motion blur, and densely overlapping objects, placing stringent demands on both accuracy and real-time performance. Although Transformer-based real-time detection methods have achieved remarkable performance by effectively modeling global context, they typically emphasize non-local feature interactions while insufficiently utilizing high-frequency local details, which are crucial for detecting small objects in aerial images. To address these limitations, we propose a novel VMC-DETR framework designed to enhance the extraction and utilization of high-frequency texture features in aerial images. Specifically, our approach integrates three innovative modules: (1) the VHeat C2f module, which employs a frequency-domain heat conduction mechanism to fine-tune feature representations and significantly enhance high-frequency detail extraction; (2) the Multi-scale Feature Aggregation and Distribution Module (MFADM), which utilizes large convolution kernels of different sizes to robustly capture effective high-frequency features; and (3) the Context Attention Guided Fusion Module (CAGFM), which ensures precise and effective fusion of high-frequency contextual information across scales, substantially improving the detection accuracy of small objects. Extensive experiments and ablation studies on three public aerial image datasets validate that our proposed VMC-DETR framework effectively balances accuracy and computational efficiency, consistently outperforming state-of-the-art methods.
ISSN:2072-4292