Interactive Content Retrieval in Egocentric Videos Based on Vague Semantic Queries

Retrieving specific, often instantaneous, content from hours-long egocentric video footage based on hazily remembered details is challenging. Vision–language models (VLMs) have been employed to enable zero-shot textual-based content retrieval from videos. But, they fall short if the textual query co...

Full description

Saved in:
Bibliographic Details
Main Authors: Linda Ablaoui, Wilson Estecio Marcilio-Jr, Lai Xing Ng, Christophe Jouffrais, Christophe Hurter
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Multimodal Technologies and Interaction
Subjects:
Online Access:https://www.mdpi.com/2414-4088/9/7/66
Tags: Add Tag
No Tags, Be the first to tag this record!