1985 Tzu L. Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules, 1985


Proc. Natl. Acad. Sci. USA Vol. 92, pp. 8584-8585, September 1995 Statistics Sequential choice from several populations MICHAEL N. KATEHAKIS AND HERBERT ROBBINS Rutgers University, New Brunswick, NJ 08903 Contributed by Herbert Robbins, May 4, 1995 ABSTRACT We consider the problem of sampling sequentially from two or more populations in such a way as to maximize the expected sum of outcomes in the long run.

Sample Mean Based Index Policies with O(log n) Regret for the Multi-Armed Bandit Problem Rajeev Agrawal Advances in Applied Probability Vol. 27, No. 4 (Dec., 1995), pp. 1054-1078


Jouini, W., Ernst, D., Moy, C. and Palicot, J., 2010, May. Upper confidence bound based decision making strategies and dynamic spectrum access. In 2010 IEEE International Conference on Communications (pp. 1-5). IEEE. We suggest that Upper Confidence Bound (UCB) algorithms could be useful to design decision making strategies for SUs to exploit intelligently the spectrum resources based on their past observations.