Thompson Sampling Multi-Armed Bandit

NumPy implementation of Thompson Sampling for K-actions stochastic bandit with a normal reward distribution
Further Optimal Regret Bounds for Thompson Sampling by Shipra Agrawal and Navin Goyal is also implemented in a specific part of the code https://arxiv.org/abs/1209.3353
required python libraries: numpy ,matplotlib ,random, stats
.ipynb file with ploted results has been added to the repo

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Normal distribution 3 arm bandit σ2 = 0.25 and μk uniformly sampled in the interval [0.0, 1.0].png		Normal distribution 3 arm bandit σ2 = 0.25 and μk uniformly sampled in the interval [0.0, 1.0].png
Normal distribution 3 arm bandit.png		Normal distribution 3 arm bandit.png
README.md		README.md
THombson_sampling.ipynb		THombson_sampling.ipynb
TS-Normal strategy for 2 arm bandit m =0 .png		TS-Normal strategy for 2 arm bandit m =0 .png
TS-Normal strategy for 2 arm bandit m =10 .png		TS-Normal strategy for 2 arm bandit m =10 .png
TS-Normal strategy for 2 arm bandit m =5 .png		TS-Normal strategy for 2 arm bandit m =5 .png
Thompson_Sampling_Multi_Armed_Bandit.py		Thompson_Sampling_Multi_Armed_Bandit.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Thompson Sampling Multi-Armed Bandit

part 1- 4 arms bandit for horizon 100 using Thompson sampling:

part 2- 2 arms bandit for horizon 100 using Thompson sampling σ2 = 0.25 and μ is uniformly sampled in the interval [0, 10]:

part 3- TS-Normal strategy by Agrawal & Goyal - 2 arm bandit m =0

part 3- TS-Normal strategy by Agrawal & Goyal - 2 arm bandit m =5

part 3- TS-Normal strategy by Agrawal & Goyal - 2 arm bandit m =10

About

Uh oh!

Releases

Packages

Languages

Elktrn/Thompson_Sampling_Multi_Armed_Bandit

Folders and files

Latest commit

History

Repository files navigation

Thompson Sampling Multi-Armed Bandit

part 1- 4 arms bandit for horizon 100 using Thompson sampling:

part 2- 2 arms bandit for horizon 100 using Thompson sampling σ2 = 0.25 and μ is uniformly sampled in the interval [0, 10]:

part 3- TS-Normal strategy by Agrawal & Goyal - 2 arm bandit m =0

part 3- TS-Normal strategy by Agrawal & Goyal - 2 arm bandit m =5

part 3- TS-Normal strategy by Agrawal & Goyal - 2 arm bandit m =10

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages