RVBench: Role values benchmark for role-playing LLMs

With the explosive development of Large Language Models (LLMs), the demand for role-playing agents has greatly increased to promote applications such as personalized digital companion and artificial society simulation. In LLM-driven role-playing, the values of agents lay the foundation for their att...

Full description

Saved in:
Bibliographic Details
Main Authors: Ye Wang, Tong Li, Meixuan Li, Ziyue Cheng, Ge Wang, Hanyue Kang, Yaling Deng, Hongjiang Xiao, Yuan Zhang
Format: Article
Language:English
Published: Elsevier 2025-08-01
Series:Computers in Human Behavior: Artificial Humans
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2949882125000684
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:With the explosive development of Large Language Models (LLMs), the demand for role-playing agents has greatly increased to promote applications such as personalized digital companion and artificial society simulation. In LLM-driven role-playing, the values of agents lay the foundation for their attitudes and behaviors, thus alignment of values is crucial in enhancing the realism of interactions and enriching the user experience. However, a benchmark for evaluating values in role-playing LLMs is absent. In this study, we built a Role Values Dataset (RVD) containing 25 roles as the groundtruth. Additionally, inspired by psychological tests in humans, we proposed a Role Values Benchmark (RVBench) including values rating and values ranking methods to evaluate the values of role-playing LLMs from subjective questionnaires and observed behavior. The values rating method tests the values orientation through the revised Portrait Values Questionnaire (PVQ-RR), which provides a direct and quantitative comparison of the roles to be played. The values ranking method assesses whether the behaviors of agents are consistent with their values’ hierarchical organization when encountering dilemmatic scenarios. Subsequent testing on a selection of both open-source and closed-source LLMs revealed that GLM-4 exhibited values most closely mirroring the roles in the RVD. However, compared to preset roles, there is still a certain gap in the role-playing ability of LLMs, including the consistency, stability and flexibility in value dimensions. These findings prompt a vital need for further research aimed at refining the role-playing capacities of LLMs from a value alignment perspective. The RVD is available at: https://github.com/northwang/RVD.
ISSN:2949-8821