Understanding multi-armed bandit calculations

Sitecore Personalize applies the Thompson sampling heuristic technique when running the multi-armed bandit algorithm in an experiment with optimized testing.

Sitecore Personalize applies a minimum of 1% traffic to each variant throughout the duration of the test. Personalize applies the 1% baseline to continuously serve and explore all variants and dynamically respond to quick shifts in visitor behavior. The multi-armed bandit algorithm runs on the remaining percentage of traffic.

As an experiment with optimized teststing starts to run, Sitecore Personalize applies even weights to each variant.

The following steps outline how the multi-armed bandit algorithm runs in Sitecore Personalize.

The multi-armed bandit algorithm:

Runs the test for the first period by showing each variant to a random percentage of traffic determined by the weights.
Collects and stores the number of times each variant was shown (impressions) and the number of conversions for each variant during the period.
Adds the conversions count (p) and impressions count (q) to running totals for conversions and impressions.
Calculates the posterior distributions for each variant according to the formula:

Posterior = Beta(p, q+1), where Beta is the beta function
Runs a Monte Carlo calculation to determine the new variant weights. The calculation inputs the number of Monte Carlo steps (N_MC) and the latest posterior distributions for each variant (as calculated in the previous step) into the algorithm. N_MC is a fixed parameter across all tests and is determined through research. The Monte Carlo calculation:
1. Initializes counts to 0.
2. Pulls a random sample from the distribution for each variant, giving N random numbers.
3. Determines which of the sampled numbers is highest and increments the win count for the variant with the highest sampled numbers by 1.
4. Repeats steps b and c N_MC times.
5. Calculates the new weights as VAR_N_WEIGHT = VAR_N_WIN_COUNT / N_MC %
Applies updated weights from the Monte Carlo calculation and repeats steps 2-5.

If you have suggestions for improving this article, let us know!