LLM-as-a-Judge: automated evaluation of search query parsing using large language models

IntroductionThe adoption of Large Language Models (LLMs) in search systems necessitates new evaluation methodologies beyond traditional rule-based or manual approaches.MethodsWe propose a general framework for evaluating structured outputs using LLMs, focusing on search query parsing within an onlin...

Full description

Saved in:

Bibliographic Details
Main Authors:	Mehmet Selman Baysan, Serkan Uysal, İrem İşlek, Çağla Çığ Karaman, Tunga Güngör
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2025-07-01
Series:	Frontiers in Big Data
Subjects:	LLM-as-a-Judge structured output evaluation search query parsing large language models evaluation framework generative search
Online Access:	https://www.frontiersin.org/articles/10.3389/fdata.2025.1611389/full
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	IntroductionThe adoption of Large Language Models (LLMs) in search systems necessitates new evaluation methodologies beyond traditional rule-based or manual approaches.MethodsWe propose a general framework for evaluating structured outputs using LLMs, focusing on search query parsing within an online classified platform. Our approach leverages LLMs' contextual reasoning capabilities through three evaluation methodologies: Pointwise, Pairwise, and Pass/Fail assessments. Additionally, we introduce a Contextual Evaluation Prompt Routing strategy to improve reliability and reduce hallucinations.ResultsExperiments conducted on both small- and large-scale datasets demonstrate that LLM-based evaluation achieves approximately 90% agreement with human judgments.DiscussionThese results validate LLM-driven evaluation as a scalable, interpretable, and effective alternative to traditional evaluation methods, providing robust query parsing for real-world search systems.
ISSN:	2624-909X

LLM-as-a-Judge: automated evaluation of search query parsing using large language models

Similar Items