Randomized experiments, or A/B tests, are the standard approach for evaluating the causal effects of new product features, i.e., treatments.
The validity of these tests rests on the “stable unit treatment value assumption” (SUTVA), which implies that the treatment only affects the behavior of treated users, and does not affect the behavior of their connections.
Violations of SUTVA, common in features that exhibit network effects, result in inaccurate estimates of the causal effect of treatment.
In this work, we propose a new experimental design for testing whether SUTVA holds, without making any assumptions on how treatment effects may spill over between the treatment and the control group. To achieve this, we simultaneously run both a completely randomized and a cluster-based randomized experiment, and then we compare the difference of the resulting estimates. We present a statistical test for measuring the significance of this difference and offer theoretical bounds on the Type I error rate.
We provide practical guidelines for implementing our methodology on large-scale experimentation platforms. Importantly, the proposed methodology can be applied to settings in which a network is not necessarily observed but, if available, can be used in the analysis. Finally, we deploy this design to LinkedIn’s experimentation platform and apply it to two online experiments, highlighting the presence of network effects and bias in standard A/B testing approaches in a real-world setting.
This work is part of a two-paper series. In the first paper we introduce the methodology and main theoretical results and in the second paper we present implementation guidelines for using the methodology on large-scale experimentation platforms.
Illustration of the proposed experimental design for detecting network effects.
(A) Graph of all units and the connections between them; the dashed circles represent (equally-sized) clusters.
(B) Assigning clusters to treatment arms: completely randomized (CR) and cluster-based randomized assignment (CBR).
(C) Assigning units to treatment buckets—treatment and control—using the corresponding strategy.
(D) Computing the treatment effect within each treatment arm: and , and variance: and .
(E) Computing the difference of the estimates from each treatment arm: , and the total variance: .
Short video describing the work. Recorded for KDD’17.