Minimax rates and adaptivity in combining experimental and observational data
Randomized controlled trials (RCTs) are the gold standard for evaluating the causal effect of a treatment; however, they often have limited sample sizes and sometimes poor generalizability. On the other hand, non-randomized, observational data derived from large administrative databases have massive...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
De Gruyter
2025-07-01
|
Series: | Journal of Causal Inference |
Subjects: | |
Online Access: | https://doi.org/10.1515/jci-2024-0024 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Randomized controlled trials (RCTs) are the gold standard for evaluating the causal effect of a treatment; however, they often have limited sample sizes and sometimes poor generalizability. On the other hand, non-randomized, observational data derived from large administrative databases have massive sample sizes and better generalizability, but are prone to unmeasured confounding bias. It is thus of considerable interest to reconcile effect estimates obtained from RCTs and observational studies investigating the same intervention, potentially harvesting the best from both realms. In this article, we theoretically characterize the potential efficiency gain from integrating observational data into the RCT-based analysis from a minimax perspective. For estimation, we derive the minimax rate of convergence for the mean-squared error and propose adaptive estimators that attain the optimal rate up to poly-log factors. For inference, we characterize the minimax rate for the length of confidence intervals and show that adaptation (to unknown confounding bias) is in general impossible. A curious phenomenon thus emerges: for estimation, the efficiency gain from data integration can be achieved without prior knowledge of the magnitude of the confounding bias; for inference, the same task becomes information theoretically impossible in general. We corroborate our theoretical findings using simulations and a real data example from the RCT DUPLICATE initiative. |
---|---|
ISSN: | 2193-3685 |