Gaussian Process with Vine Copula-Based Context Modeling for Contextual Multi-Armed Bandits
We propose a novel contextual multi-armed bandit (CMAB) framework that integrates copula-based context generation with Gaussian Process (GP) regression for reward modeling, addressing complex dependency structures and uncertainty in sequential decision-making. Context vectors are generated using Gau...
Saved in:
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-06-01
|
Series: | Mathematics |
Subjects: | |
Online Access: | https://www.mdpi.com/2227-7390/13/13/2058 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1839631752175288320 |
---|---|
author | Jong-Min Kim |
author_facet | Jong-Min Kim |
author_sort | Jong-Min Kim |
collection | DOAJ |
description | We propose a novel contextual multi-armed bandit (CMAB) framework that integrates copula-based context generation with Gaussian Process (GP) regression for reward modeling, addressing complex dependency structures and uncertainty in sequential decision-making. Context vectors are generated using Gaussian and vine copulas to capture nonlinear dependencies, while arm-specific reward functions are modeled via GP regression with Beta-distributed targets. We evaluate three widely used bandit policies—Thompson Sampling (TS), <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>ε</mi></semantics></math></inline-formula>-Greedy, and Upper Confidence Bound (UCB)—on simulated environments informed by real-world datasets, including Boston Housing and Wine Quality. The Boston Housing dataset exemplifies heterogeneous decision boundaries relevant to housing-related marketing, while the Wine Quality dataset introduces sensory feature-based arm differentiation. Our empirical results indicate that the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>ε</mi></semantics></math></inline-formula>-Greedy policy consistently achieves the highest cumulative reward and lowest regret across multiple runs, outperforming both GP-based TS and UCB in high-dimensional, copula-structured contexts. These findings suggest that combining copula theory with GP modeling provides a robust and flexible foundation for data-driven sequential experimentation in domains characterized by complex contextual dependencies. |
format | Article |
id | doaj-art-b6977c1f47e34e358b2df96c644e4d12 |
institution | Matheson Library |
issn | 2227-7390 |
language | English |
publishDate | 2025-06-01 |
publisher | MDPI AG |
record_format | Article |
series | Mathematics |
spelling | doaj-art-b6977c1f47e34e358b2df96c644e4d122025-07-11T14:40:22ZengMDPI AGMathematics2227-73902025-06-011313205810.3390/math13132058Gaussian Process with Vine Copula-Based Context Modeling for Contextual Multi-Armed BanditsJong-Min Kim0Statistics Discipline, Division of Science and Mathematics, University of Minnesota, Morris, MN 56267, USAWe propose a novel contextual multi-armed bandit (CMAB) framework that integrates copula-based context generation with Gaussian Process (GP) regression for reward modeling, addressing complex dependency structures and uncertainty in sequential decision-making. Context vectors are generated using Gaussian and vine copulas to capture nonlinear dependencies, while arm-specific reward functions are modeled via GP regression with Beta-distributed targets. We evaluate three widely used bandit policies—Thompson Sampling (TS), <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>ε</mi></semantics></math></inline-formula>-Greedy, and Upper Confidence Bound (UCB)—on simulated environments informed by real-world datasets, including Boston Housing and Wine Quality. The Boston Housing dataset exemplifies heterogeneous decision boundaries relevant to housing-related marketing, while the Wine Quality dataset introduces sensory feature-based arm differentiation. Our empirical results indicate that the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>ε</mi></semantics></math></inline-formula>-Greedy policy consistently achieves the highest cumulative reward and lowest regret across multiple runs, outperforming both GP-based TS and UCB in high-dimensional, copula-structured contexts. These findings suggest that combining copula theory with GP modeling provides a robust and flexible foundation for data-driven sequential experimentation in domains characterized by complex contextual dependencies.https://www.mdpi.com/2227-7390/13/13/2058contextual multi-armed banditsGaussian processcopula |
spellingShingle | Jong-Min Kim Gaussian Process with Vine Copula-Based Context Modeling for Contextual Multi-Armed Bandits Mathematics contextual multi-armed bandits Gaussian process copula |
title | Gaussian Process with Vine Copula-Based Context Modeling for Contextual Multi-Armed Bandits |
title_full | Gaussian Process with Vine Copula-Based Context Modeling for Contextual Multi-Armed Bandits |
title_fullStr | Gaussian Process with Vine Copula-Based Context Modeling for Contextual Multi-Armed Bandits |
title_full_unstemmed | Gaussian Process with Vine Copula-Based Context Modeling for Contextual Multi-Armed Bandits |
title_short | Gaussian Process with Vine Copula-Based Context Modeling for Contextual Multi-Armed Bandits |
title_sort | gaussian process with vine copula based context modeling for contextual multi armed bandits |
topic | contextual multi-armed bandits Gaussian process copula |
url | https://www.mdpi.com/2227-7390/13/13/2058 |
work_keys_str_mv | AT jongminkim gaussianprocesswithvinecopulabasedcontextmodelingforcontextualmultiarmedbandits |