Skip to content

Elktrn/Thompson_Sampling_Multi_Armed_Bandit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Thompson Sampling Multi-Armed Bandit

NumPy implementation of Thompson Sampling for K-actions stochastic bandit with a normal reward distribution
Further Optimal Regret Bounds for Thompson Sampling by Shipra Agrawal and Navin Goyal is also implemented in a specific part of the code https://arxiv.org/abs/1209.3353
required python libraries: numpy ,matplotlib ,random, stats
.ipynb file with ploted results has been added to the repo

part 1- 4 arms bandit for horizon 100 using Thompson sampling:

4 arms bandit for horizon 100 using Thompson sampling

part 2- 2 arms bandit for horizon 100 using Thompson sampling σ2 = 0.25 and μ is uniformly sampled in the interval [0, 10]:

2 arms bandit for horizon 100 using Thompson sampling σ2 = 0.25 and μ is uniformly sampled in the interval [0, 10]

part 3- TS-Normal strategy by Agrawal & Goyal - 2 arm bandit m =0

TS-Normal strategy by Agrawal & Goyal - 2 arm bandit m =0

part 3- TS-Normal strategy by Agrawal & Goyal - 2 arm bandit m =5

TS-Normal strategy by Agrawal & Goyal - 2 arm bandit m =5

part 3- TS-Normal strategy by Agrawal & Goyal - 2 arm bandit m =10

TS-Normal strategy by Agrawal & Goyal - 2 arm bandit m =10

About

Thompson Sampling for K-actions stochastic bandit with normal reward distribution

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published