You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are currently working on a R package ("contextual") that aims to facilitate the implementation and simulation of both context-free and contextual Multi-Armed Bandit policies in R.
As "Bandit Algorithms for Website Optimization" offers a comprehensive entry-level introduction to context-free bandits policy evaluation, we decided to replicate the book's simulations.
In doing so, we found that the book's source code in this repository deterministically chooses the first arm (in other words, the arm with the lowest index) when rewards between arms are tied:
def ind_max(x):
m = max(x)
return x.index(m)
As can be seen in our replication vignette, this introduces a bias that adds up over time, changing simulations' results and plots. To illustrate, left our replication of Figure 4-2 without breaking ties randomly, right when correctly breaking ties randomly:
A patch along the following lines would resolve this issue by breaking ties randomly:
def ind_max(x):
max_value = max(x)
max_keys = [k for k, v in enumerate(x) if v == max_value]
return random.choice(max_keys)
(I presume that the now closed but unresolved #10 also alluded to this particular issue)
The text was updated successfully, but these errors were encountered:
We are currently working on a R package ("contextual") that aims to facilitate the implementation and simulation of both context-free and contextual Multi-Armed Bandit policies in R.
As "Bandit Algorithms for Website Optimization" offers a comprehensive entry-level introduction to context-free bandits policy evaluation, we decided to replicate the book's simulations.
In doing so, we found that the book's source code in this repository deterministically chooses the first arm (in other words, the arm with the lowest index) when rewards between arms are tied:
As can be seen in our replication vignette, this introduces a bias that adds up over time, changing simulations' results and plots. To illustrate, left our replication of Figure 4-2 without breaking ties randomly, right when correctly breaking ties randomly:
A patch along the following lines would resolve this issue by breaking ties randomly:
(I presume that the now closed but unresolved #10 also alluded to this particular issue)
The text was updated successfully, but these errors were encountered: