Talk:Chinese restaurant process

Mistake edit

There is some mistake with the expected number of tables. First, it doesn't depend on alpha. Second, it should go to alpha * log n as n increases, and when theta=1, and that doesn't happen. I think there is somewhere a theta instead of an alpha.

The formula here:

 

is stated to be for the case with  , so it correctly does not depend on it. The digamma function is O(log(x)), so the expected number of tables is  .

When the discount parameter is not zero, the number of expected tables is not O(log(n)) but a power law, which is the defining characteristic of the Pitman-Yor Process that is integrated out to get the two parameter CRP. Cswitch (talk) 06:53, 18 January 2023 (UTC)Reply

It is not "immediate" that the probability assigned to a particular partition is ... edit

"It is then immediate that the probability assigned to any particular partition (ignoring the order in which customers sit around any particular table) is"

I am a mathematics graduate and it is not obvious to me why the given formula holds. I think that a derivation of the simple, non-generalized case belongs here. I hate it when people say that something "follows immediately" when, in fact, it requires half a hour's thought and a half-page derivation. — Preceding unsigned comment added by 78.105.228.69 (talk) 00:58, 31 December 2011 (UTC)Reply

article needs discussion of meaning of this distribution edit

doesnt it :) Anlace 21:41, 27 June 2006 (UTC)Reply

What is the application of this formula? edit

How, or in what circumstances, is it useful? --80.41.36.60 13:54, 25 July 2006 (UTC)Reply

That it is relevant to the Ewens sampling formula is stated explicitly in the article. Therefore I was surprised by the question above. But now I've made it more explicit by adding a link to population genetics. Michael Hardy 15:01, 25 July 2006 (UTC)Reply

notation edit

is the notation inconsistent? for example: n_k is the number of elements in the n-th block... but... |b| is also the number of elements in a block. —Preceding unsigned comment added by 143.215.146.218 (talkcontribs)

Does the generalization actually work? edit

Forgive me, I don't know much about the notations used here. But even using the convention  , which I myself thought appropriate of adding, taking the case α = 0, corresponding to Ewens' distribution, we should have:

 

 

or with  , assuming only one occupied table with both people

 


which I get by directly computing the probability   that the second person sit at the same table as the first.
Anyone knows better?

212.126.224.100 (talk) 19:05, 27 July 2010 (UTC)Reply

I see this comment has been here for several weeks. I just saw it. I'll look at it later this afternoon. Michael Hardy (talk) 15:08, 17 August 2010 (UTC)Reply
....I still haven't gotten to this. I suspect no one who's actually an expert on the Ewen's formula and the Chinese restaurant process has worked on this article. Michael Hardy (talk) 16:57, 27 August 2010 (UTC)Reply
guess not. 212.126.224.100 (talk) 14:41, 31 August 2010 (UTC)Reply
The formulate is indeed wrong. Instead of |B| it should read |B|-1 in the above quoted section. Since this is properly the most important thing to get right I hope it is okay that I correct the text, see for instance formula 16 of pitmanns paper: http://www.springerlink.com/content/k175tg8150441520/fulltext.pdf (quoted within main body of the text). — Preceding unsigned comment added by 130.225.93.116 (talk) 12:13, 24 September 2012 (UTC)Reply

The intro to this article is currently too technical for most readers edit

The intro to this article currently fails to comply with the Wikipedia policy [[Wikipedia:Make technical articles understandable"}}. The intro fails to properly explain what the "Chinese restaurant process" is in basic terms that most readers can understand. The first sentence should describe the in a very general sense what the Chinese restaurant process is, then go into the specifics about the formula. The intro should also include a sentence explaining why it's called the Chinese restaurant process. --67.101.213.239 (talk) 14:55, 27 April 2013 (UTC)Reply

The above criticism is still valid. The article should explain why anyone might be interested in studying this process. If there is any heuristic or intuitive way to understand the process, it should be mentioned as a non-rigorous way of gaining some insight into its significance. Reify-tech (talk) 01:28, 3 August 2014 (UTC)Reply
I found that the second explanation, in "definition", was easier to understand than the first explanation in the introduction, so I essentially swapped them around. Hope this helps. Grabigail (talk) 15:27, 26 September 2014 (UTC)Reply
Thank you, the change does somewhat improve the article. However, an expanded explanation of why mathematicians (or anyone else) might be interested in this particular process would still be helpful; the "Applications" section offers a scant few clues. Reify-tech (talk) 16:03, 26 September 2014 (UTC)Reply

External links modified edit

Hello fellow Wikipedians,

I have just added archive links to one external link on Chinese restaurant process. Please take a moment to review my edit. If necessary, add {{cbignore}} after the link to keep me from modifying it. Alternatively, you can add {{nobots|deny=InternetArchiveBot}} to keep me off the page altogether. I made the following changes:

When you have finished reviewing my changes, please set the checked parameter below to true to let others know.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 18 January 2022).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—cyberbot IITalk to my owner:Online 07:08, 15 January 2016 (UTC)Reply

Formal definition appears to be incorrect!! edit

The article, as currently written, is either just-plain wrong, or is deeply misleading. The core problem is that it uses n to count both time-steps, and the number of "occupied" tables. It states that, at time n, the probability of sitting at table   is   and that otherwise, the probability of sitting at table   is   where   is the number of people currently sitting at table  . Hopefully, it is plain to see that this will result in most tables having exactly zero occupants, and that furthermore, since   there is exactly zero probability that anyone will ever sit at that table. That is, once a table is skipped, it will stay unoccupied, forever. Viz. at time n, if no one sits at  , then at later time, the probability of sitting at that table is  . Surely, that cannot be an intended part of the definition of this distribution: viz, that it consists almost entirely of empty tables, at which it becomes impossible/forbidden to sit? And heaven help us, if this is correct, then can the article be amended to make it clear that there will be forbidden tables? And state what the fraction of forbidden tables are over time? Personally, I suspect that the definition is just-plain-wrong, but its hard to guess what the correct, intended definition should be. 14.0.170.34 (talk) 01:59, 18 August 2018 (UTC)Reply

The definition is correct. The distribution will distribute n people among n tables. Most of those tables will end up being empty. Specifically, table n will be empty with a probability  . However the result of this process does not care about table numbering, so assigning customers in this way is equivalent to assigning them to the next empty table. The formal definition is written this way because "the next empty table" is a much more complicated thing to formally define than "table n." — Preceding unsigned comment added by 2620:0:1008:14:9074:BB74:7F7:7432 (talk) 18:23, 15 October 2019 (UTC)Reply

Proposed merge with Chinese restaurant table distribution edit

The process and related distribution seem to be essentially the same topic and so are best covered together. Andrew D. (talk) 16:24, 7 November 2019 (UTC)Reply

    Y Merger complete. Klbrain (talk) 16:04, 14 July 2020 (UTC)Reply

Problematic Naming edit

I think it should be noted that the names "Chinese restaurant" and "Indian buffet" are rather problematic, and seem to stem from stereotypes in the 1980s. Is there any reason for this to be called a "Chinese restaurant process" and not simply a "restaurant process"? Of course, I understand if wikipedia is not the right forum for such discussions. — Preceding unsigned comment added by 62.141.176.1 (talk) 15:04, 6 September 2022 (UTC)Reply

This is not "a Chinese restaurant process" but "the Chinese Restaurant Process". Wikipedia cannot rename this process because that is how it is known in the literature. I'm not sure which stereotypes from the 80s you are referring to. The Wikipedia article on Table Sharing claims the name of the process comes from the custom as practiced in restaurants in Hong Kong, Taiwan, and parts of China. 71.136.136.21 (talk) 04:45, 7 September 2022 (UTC)Reply

Expected number of tables in general case needs clarification edit

The current formula for the generalized CRP uses the notation of the gamma function extensively, but then notes that the gamma notation is not the usual gamma function without elaboration. The reference is Jim Pitman's "Combinatorial Stochastic Processes", and while it is a great text it is also 250 pages of very, very dense math. Without a page number and/or clarification, the reference provides no value- reading an advanced textbook to guess what is meant when a short definition would suffice is unreasonably onerous. I have a copy of the text, and I'd be happy to rewrite the section/reference if the editor can add more detail. Otherwise, I propose removing the formula entirely as it provides no information beyond the reference- simply saying that the reference contains a derivation for the general case would be clearer. Cswitch (talk) 07:07, 18 January 2023 (UTC)Reply