Ties not broken randomly #18

robinvanemden · 2019-03-03T12:43:22Z

We are currently working on a R package ("contextual") that aims to facilitate the implementation and simulation of both context-free and contextual Multi-Armed Bandit policies in R.

As "Bandit Algorithms for Website Optimization" offers a comprehensive entry-level introduction to context-free bandits policy evaluation, we decided to replicate the book's simulations.

In doing so, we found that the book's source code in this repository deterministically chooses the first arm (in other words, the arm with the lowest index) when rewards between arms are tied:

def ind_max(x):
  m = max(x)
  return x.index(m)

As can be seen in our replication vignette, this introduces a bias that adds up over time, changing simulations' results and plots. To illustrate, left our replication of Figure 4-2 without breaking ties randomly, right when correctly breaking ties randomly:

A patch along the following lines would resolve this issue by breaking ties randomly:

def ind_max(x):
  max_value = max(x)
  max_keys = [k for k, v in enumerate(x) if v == max_value]
  return random.choice(max_keys)

(I presume that the now closed but unresolved #10 also alluded to this particular issue)

The text was updated successfully, but these errors were encountered:

robinvanemden mentioned this issue Mar 3, 2019

for python: prevent deterministic start for each simulation (usually one... #10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ties not broken randomly #18

Ties not broken randomly #18

robinvanemden commented Mar 3, 2019 •

edited

Loading

Ties not broken randomly #18

Ties not broken randomly #18

Comments

robinvanemden commented Mar 3, 2019 • edited Loading

robinvanemden commented Mar 3, 2019 •

edited

Loading