NumPy implementation of Thompson Sampling for K-actions stochastic bandit with a normal reward distribution
Further Optimal Regret Bounds for Thompson Sampling by Shipra Agrawal and Navin Goyal is also implemented in a specific part of the code
https://arxiv.org/abs/1209.3353
required python libraries: numpy ,matplotlib ,random, stats
.ipynb file with ploted results has been added to the repo
-
Notifications
You must be signed in to change notification settings - Fork 0
Elktrn/Thompson_Sampling_Multi_Armed_Bandit
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
Thompson Sampling for K-actions stochastic bandit with normal reward distribution
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published

![2 arms bandit for horizon 100 using Thompson sampling σ2 = 0.25 and μ is uniformly sampled in the interval [0, 10]](https://github.com/MohammadAsadolahi/Thompson_Sampling_Multi_Armed_Bandit/raw/main/Normal%20distribution%203%20arm%20bandit%20%CF%832%20%3D%200.25%20and%20%CE%BCk%20uniformly%20sampled%20in%20the%20interval%20%5B0.0%2C%201.0%5D.png)


