1985 Tzu L. Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules, 1985
1995
Proc. Natl. Acad. Sci. USA Vol. 92, pp. 8584-8585, September 1995 Statistics Sequential choice from several populations MICHAEL N. KATEHAKIS AND HERBERT ROBBINS Rutgers University, New Brunswick, NJ 08903 Contributed by Herbert Robbins, May 4, 1995 ABSTRACT We consider the problem of sampling sequentially from two or more populations in such a way as to maximize the expected sum of outcomes in the long run.
Sample Mean Based Index Policies with O(log n) Regret for the Multi-Armed Bandit Problem Rajeev Agrawal Advances in Applied Probability Vol. 27, No. 4 (Dec., 1995), pp. 1054-1078
2010
Jouini, W., Ernst, D., Moy, C. and Palicot, J., 2010, May. Upper confidence bound based decision making strategies and dynamic spectrum access. In 2010 IEEE International Conference on Communications (pp. 1-5). IEEE. We suggest that Upper Confidence Bound (UCB) algorithms could be useful to design decision making strategies for SUs to exploit intelligently the spectrum resources based on their past observations.
This page is auto-translated from /nishio/UCB1 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.