Talk:Thompson sampling

This is the talk page for discussing improvements to the Thompson sampling article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Robotics

	This article is within the scope of WikiProject Robotics, a collaborative effort to improve the coverage of Robotics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.RoboticsWikipedia:WikiProject RoboticsTemplate:WikiProject RoboticsRobotics articles
???	This article has not yet received a rating on the project's importance scale.

Statistics Low‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
Low	This article has been rated as Low-importance on the importance scale.

Articles for creation

	This article was reviewed by member(s) of WikiProject Articles for creation. The project works to allow users to contribute quality articles and media files to the encyclopedia and track their progress as they are developed. To participate, please visit the project page for more information.Articles for creationWikipedia:WikiProject Articles for creationTemplate:WikiProject Articles for creationAfC articles
	This article was accepted on 24 December 2011 by reviewer Chzz (talk · contribs).

I would like to suggest an additional reference for the Thompson sampling page, to be added to the collection of already existing references at the end of the second history sentence, which reads: "It was subsequently rediscovered numerous times independently in the context of reinforcement learning."

The new reference pre-dates the references already given, which may make it of interest to Wikipedia readers (the new reference has date 1994, while the earliest one there now is for 1997). Our work, like the others cited, was unaware of the earlier (1933!) Thompson reference.

Because it is my own publication, I am not editing the page directly, but merely making a suggestion here that it might be of interest and worth doing. (I don't fully understand all of the Wikipedia COI guidelines, but this seems about right...)

There is a link to the publication here: (Link to Rivest-Yin paper on Simulation results for a new two-armed bandit heuristic) This is not the link I would propose to insert, but only so an editor can see the bib file and the context a bit more fully.

I do not have a link to an online copy of this article, other than the one I have posted on my own web site. A possible link to be added to the wikipedia page might look like this: Simulation Results for a new two-armed bandit heuristic. Ronald L. Rivest and Yiqun Yin. Proceedings of a workshop on Computational Learning Theory and Natural Learning Systems (Princeton, New Jersey, 1994) pp. 477--486.

Ronald L. Rivest (talk) 23:16, 5 August 2013 (UTC)Reply

The section "Relationship to other approaches > Probability matching" just briefly describes probability matching, but doesn't in any way describe how Thompson sampling relates to it. It's quite confusing 66.29.243.106 (talk) 14:54, 8 September 2014 (UTC)Reply

It is my understanding that probability matching refers to the broad class of techniques where probability of optimality (or desired objective) is matched with the sample rate, while Thompson sampling refers to the Bayesian implementation (or perhaps the "randomized" implementation?) of it where one updates a prior in each round. This distinction doesn't persist in all the literature though. I don't have a good citation for discussing this relationship so I haven't made the edit to clarify. I have, however, removed the word "suboptimal" from the probability matching definition as it implies a claim that is not supported by any of the sources (and indeed some sources demonstrate that probability matching techniques can be optimal, conditional on available information). Giuseppe Burtini (talk) 14:57, 13 December 2014 (UTC)Reply

I was going to add, just as above, that the probability matching section comes without an explanation of how Thomspon sampling and probability matching are related. My imperfect understanding, which I was hoping to improve upon by coming here, is that both do not select a predicted best choice as in other bandit algorithms, that for example, always go with the highest upper confidence interval. Instead, they select a choice stochastically based on the probability that it will be the best choice. At least that's what probability matching suggests. Whether or when (if it varies in the literature) Thompson Sampling has this stochastic character (and perhaps how) is the information this section seems to be missing. Note: A related discussion and data that imply benefits of Thompson Sampling can be found in this paper: Lomas et al (2016). Interface Design Optimization as a Multi-Armed Bandit Problem. In Proceedings of the SIGCHI Conference on Human Factors in Computing System, pp. 4142-4153. Disclosure: I am an author on this paper, but other authors have greater expertise relevant to Thompson Sampling. Thanks! --Koedinger (talk) 13:16, 28 October 2016 (UTC)Reply

The article cites Wyatt's thesis as "a first proof of convergence for the bandit case". If this is referring to the proof in sec. 4.3, pp. 61-63 of that source, then I believe this sentence should be removed because the proof is incorrect. The gist of the proof is that our estimates of per-arm reward converge to the true estimates as the number of samples on each arm tends to infinity; hence we need only show that the number of samples on each arm always tends to infinity. This is "proved" by showing that the probability of sampling a given arm is always strictly positive, at each step. But this does not prove that the arm is sampled infinitely often: see the first Borel-Cantelli lemma, for instance. Gostevehoward (talk) 16:22, 8 April 2017 (UTC)Reply

Thompson Sampling is Asymptotically Optimal in General Environments edit

Latest comment: 5 years ago1 comment1 person in discussion

http://auai.org/uai2016/proceedings/papers/20.pdf

2001:700:1200:5118:6D4F:98C0:7303:A490 (talk) 13:25, 6 September 2018 (UTC)Reply

Add topic