Interactive Content Retrieval in Egocentric Videos Based on Vague Semantic Queries
Retrieving specific, often instantaneous, content from hours-long egocentric video footage based on hazily remembered details is challenging. Vision–language models (VLMs) have been employed to enable zero-shot textual-based content retrieval from videos. But, they fall short if the textual query co...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-06-01
|
Series: | Multimodal Technologies and Interaction |
Subjects: | |
Online Access: | https://www.mdpi.com/2414-4088/9/7/66 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|