Gaussian Process with Vine Copula-Based Context Modeling for Contextual Multi-Armed Bandits

We propose a novel contextual multi-armed bandit (CMAB) framework that integrates copula-based context generation with Gaussian Process (GP) regression for reward modeling, addressing complex dependency structures and uncertainty in sequential decision-making. Context vectors are generated using Gau...

Full description

Saved in:

Bibliographic Details
Main Author:	Jong-Min Kim
Format:	Article
Language:	English
Published:	MDPI AG 2025-06-01
Series:	Mathematics
Subjects:	contextual multi-armed bandits Gaussian process copula
Online Access:	https://www.mdpi.com/2227-7390/13/13/2058
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1839631752175288320
author	Jong-Min Kim
author_facet	Jong-Min Kim
author_sort	Jong-Min Kim
collection	DOAJ
description	We propose a novel contextual multi-armed bandit (CMAB) framework that integrates copula-based context generation with Gaussian Process (GP) regression for reward modeling, addressing complex dependency structures and uncertainty in sequential decision-making. Context vectors are generated using Gaussian and vine copulas to capture nonlinear dependencies, while arm-specific reward functions are modeled via GP regression with Beta-distributed targets. We evaluate three widely used bandit policies—Thompson Sampling (TS), <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>ε</mi></semantics></math></inline-formula>-Greedy, and Upper Confidence Bound (UCB)—on simulated environments informed by real-world datasets, including Boston Housing and Wine Quality. The Boston Housing dataset exemplifies heterogeneous decision boundaries relevant to housing-related marketing, while the Wine Quality dataset introduces sensory feature-based arm differentiation. Our empirical results indicate that the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>ε</mi></semantics></math></inline-formula>-Greedy policy consistently achieves the highest cumulative reward and lowest regret across multiple runs, outperforming both GP-based TS and UCB in high-dimensional, copula-structured contexts. These findings suggest that combining copula theory with GP modeling provides a robust and flexible foundation for data-driven sequential experimentation in domains characterized by complex contextual dependencies.
format	Article
id	doaj-art-b6977c1f47e34e358b2df96c644e4d12
institution	Matheson Library
issn	2227-7390
language	English
publishDate	2025-06-01
publisher	MDPI AG
record_format	Article
series	Mathematics
spelling	doaj-art-b6977c1f47e34e358b2df96c644e4d122025-07-11T14:40:22ZengMDPI AGMathematics2227-73902025-06-011313205810.3390/math13132058Gaussian Process with Vine Copula-Based Context Modeling for Contextual Multi-Armed BanditsJong-Min Kim0Statistics Discipline, Division of Science and Mathematics, University of Minnesota, Morris, MN 56267, USAWe propose a novel contextual multi-armed bandit (CMAB) framework that integrates copula-based context generation with Gaussian Process (GP) regression for reward modeling, addressing complex dependency structures and uncertainty in sequential decision-making. Context vectors are generated using Gaussian and vine copulas to capture nonlinear dependencies, while arm-specific reward functions are modeled via GP regression with Beta-distributed targets. We evaluate three widely used bandit policies—Thompson Sampling (TS), <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>ε</mi></semantics></math></inline-formula>-Greedy, and Upper Confidence Bound (UCB)—on simulated environments informed by real-world datasets, including Boston Housing and Wine Quality. The Boston Housing dataset exemplifies heterogeneous decision boundaries relevant to housing-related marketing, while the Wine Quality dataset introduces sensory feature-based arm differentiation. Our empirical results indicate that the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>ε</mi></semantics></math></inline-formula>-Greedy policy consistently achieves the highest cumulative reward and lowest regret across multiple runs, outperforming both GP-based TS and UCB in high-dimensional, copula-structured contexts. These findings suggest that combining copula theory with GP modeling provides a robust and flexible foundation for data-driven sequential experimentation in domains characterized by complex contextual dependencies.https://www.mdpi.com/2227-7390/13/13/2058contextual multi-armed banditsGaussian processcopula
spellingShingle	Jong-Min Kim Gaussian Process with Vine Copula-Based Context Modeling for Contextual Multi-Armed Bandits Mathematics contextual multi-armed bandits Gaussian process copula
title	Gaussian Process with Vine Copula-Based Context Modeling for Contextual Multi-Armed Bandits
title_full	Gaussian Process with Vine Copula-Based Context Modeling for Contextual Multi-Armed Bandits
title_fullStr	Gaussian Process with Vine Copula-Based Context Modeling for Contextual Multi-Armed Bandits
title_full_unstemmed	Gaussian Process with Vine Copula-Based Context Modeling for Contextual Multi-Armed Bandits
title_short	Gaussian Process with Vine Copula-Based Context Modeling for Contextual Multi-Armed Bandits
title_sort	gaussian process with vine copula based context modeling for contextual multi armed bandits
topic	contextual multi-armed bandits Gaussian process copula
url	https://www.mdpi.com/2227-7390/13/13/2058
work_keys_str_mv	AT jongminkim gaussianprocesswithvinecopulabasedcontextmodelingforcontextualmultiarmedbandits

Gaussian Process with Vine Copula-Based Context Modeling for Contextual Multi-Armed Bandits

Similar Items