Wikipedia talk:WikiProject Statistics/Archive 4

Pageview stats

After a recent request, I added WikiProject Statistics to the list of projects to compile monthly pageview stats for. The data is the same used by http://stats.grok.se/en/ but the program is different, and includes the aggregate views from all redirects to each page. The stats are at Wikipedia:WikiProject Statistics/Popular pages.

The page will be updated monthly with new data. The edits aren't marked as bot edits, so they will show up in watchlists. You can view more results, request a new project be added to the list, or request a configuration change for this project using the toolserver tool. If you have any comments or suggestions, please let me know. Thanks! Mr.Z-man 03:01, 1 February 2010 (UTC)

Hmm, I was just looking at the top 10. Sadly only 2 were even assessed. Also, why is Gantt chart in this project? 018 (talk) 18:16, 3 February 2010 (UTC)
A link to Wikipedia:WikiProject_Statistics/Popular_pages should be put somewhere on our main Wikiproject page. Otherwise this topic will be archived in few months and lost forever :(  // stpasha »  17:32, 8 March 2010 (UTC)
I think there are now links in two places. Melcombe (talk) 10:42, 14 May 2010 (UTC)

Assessment

I see that the set of useable classes for articles in the maths project has been revised ... previously some notionally valid classes caused problems for some automatic processing tools. This means that, if appropriate, the classes used for the stats project banner might be revised so as to agree with those in the maths project. A set of short-meanings for classes can be found at Wikipedia:WikiProject Mathematics/Wikipedia 1.0/Assessment (specific to maths project), and these seem to allign with the longer versions at Wikipedia:Version 1.0 Editorial Team/Assessment which is where the stats project banner is based. Melcombe (talk) 12:47, 12 May 2010 (UTC)

Only maths project has additionally the B+ class, which is intermediate between B and GA. Do we want to include B+ into our assessment scheme?  // stpasha »  18:45, 13 May 2010 (UTC)
Well, there are many instances of articles where the classes under the maths and stats projects really should be identical, because there is no reason for them to be different, so this would be useful. As seen on the Stats project page, there are presently (only) 11 articles in the B+ class in the probability and statistics field of the maths project. More generally, many projects are using or moving towards using the same banner template and hence the same availability of classes (at least as options rather than by default) so it would be good to keep an eye on commonality across other projects as well. Melcombe (talk) 09:56, 14 May 2010 (UTC)

Robert V. Hogg nominated for Wikipedia:Did you know

Today I wrote an article about the Robert V. Hogg, the founder of the University of Iowa's Statistics Department. Tonight/this morning, I was informed that the article was going to be nominated for the Wikipedia:Did you know status (on the front page). This nomination makes me request assistance with the article, because my stamina is insufficent to do anything else. Thanks! Kiefer.Wolfowitz (talk) 04:22, 16 May 2010 (UTC)

Clarification of scope?

The project page doesn't make its scope clear. Does this project encompass such things as official statistics, e.g. Census? --GenericBob (talk) 12:12, 7 May 2010 (UTC)

There's a brief discussion of this in the archive. Cordless Larry (talk) 13:56, 7 May 2010 (UTC)
Thanks. As GB suggests, the scope of the project should be explained clearly on the main project page, not relegated to discussion, much less discussion archives. --P64 (talk) 22:42, 7 May 2010 (UTC)
I agree. I was just pointing out that I tried to get clarification once before, with not much response. Cordless Larry (talk) 07:09, 8 May 2010 (UTC)
It's hard to get a response when we have barely a few people on the project. I stated the new scope of the project on the front page, hopefully it won't get reverted.  // stpasha »  20:56, 10 May 2010 (UTC)

Stpasha, actuaries are a type of statisticians, and thus the article is in scope. -- Avi (talk) 04:36, 11 May 2010 (UTC)

I have restored the last acceptable definition of what the scope of the project actually is. Stpasha is trying to be far too restrictive and trying to impose his own views. The early discussion on formation of this project centred on two things:

  • that there was a lot of "statistics" that would not fit neatly into the mathematics project, particularly not under "probability and statistics" within that project.
  • that there were a lot a articles around containing statistical topics whose statistical content was poor.

Thus the actual scope of the project was/is to improve all the articles with any more than an extremely brief mention that statistics might be being used. I note that Talk:Actuary has a different definition from Stpasha, that this time tries to restrict attention to "science". Melcombe (talk) 09:06, 11 May 2010 (UTC)

I concur, the intent of the project was to focus on statistics, not on solely theoretical statistics, and I would agree that Stpasha's definition is too restrictive. However, I will note this at at Wikipedia_talk:WikiProject_Mathematics so we can get a wider response. -- Avi (talk) 12:17, 11 May 2010 (UTC)

Melcombe, you always revert my edits without even looking at them. You may not like the new definition, but what was wrong with the change in formatting of the picture in the corner so that you had to revert that as well? Now you restored the “last acceptable definition” which was not really acceptable. It just WAS there. Yes, I'm trying to limit the scope to the theoretical aspects of statistics, because otherwise the project can conceivably include every single article on the Wikipedia, simply because one can collect some type of statistical data just about any topic.

My issue with having too broad scope is that we already can barely handle the amount of articles currently within the scope. Including the articles in applied statistics will not be manageable with this few people. I’m planning to restore the definition which limits the scope to scientific aspects. If you feel that some other topics have to be included as well, define them clearly (not “everything with the word ‘statistics’ in it”) and we can add that to the scope just like we did with the history and the biographies. I’m also planning to clean-up, reorganize, and bring up-to-date the portal page within the coming week.

The early discussion on formation of the project centered on two things: that statistics as a science is not a proper subdivision of mathematics — in particular most large universities have the statistics department separately from the math department; and secondly that the statistical topics lag far behind the math topics.  // stpasha »  16:03, 11 May 2010 (UTC)

Stpasha, you are wrong again. “Last acceptable definition” does not mean it was perfect, but considerably better than your version. The early discussion did not talk about what is done at "large universities", but rather that there are a lot of statistics topics on Wikipedia that are not "mathematics". If you believe that only "science" uses statistics, then you are certainly wrong. There was no mention that "that the statistical topics lag far behind the math topics", but rather that the content of many Wikipedia articles that tried to talk about statistical techniques was poor are even wrong. We have seen before that you have no idea of developing a consensus before making radsical changes, and now here you are trying to steer the entire project to your whim. Melcombe (talk) 16:55, 11 May 2010 (UTC)
I don't think that only “science” uses statistics (which doesn’t preclude me from being wrong of course). However this is exactly my point. If someone or something makes use of statistics this should not be the reason to include it into the project. All sciences use math — does that mean that all topics in all sciences should be included in wikiproject Mathematics? All computer programs use computer science — does that mean that all types of software should be included into the wikiproject Computer Science? Yes, I'm trying to steer the project — extend the scope in one direction and limit in another. But if someone doesn't do it, the project would die from immobility.  // stpasha » 
You seem to wanting to constrain the project in ways that it doesn't need to be, and shouldn't be constrained. Let's take a different tack ... if you look at Wikipedia:WikiProject Council/Directory/Science you will see that a number of projects are partly subdivided into "task forces" ... I am unfamiliar with this way of going on, but it may be this way ogf organising a project may be of some help. Melcombe (talk) 08:48, 12 May 2010 (UTC)

On an immediately-related topic, I suggest that we should remove probability theory from the scope of the project. After all there is a "probability" project that could easily be reactivated and at least a few people editing probability theory articles. Then those who are only concerned with displaying their supposed mathematical sophistication rather than writing readable text for general readers can have their own playground. Of course it would be better if they did take care of understandability, but still. Then the broad scope of the project could be defined by, but not exclusively by, things under Category:Statistics, which would be logical and moderately easy to understand. 17:05, 11 May 2010 (UTC)

Okay, what do you want in and out of the scope? I have no interest in biographies (I know nothing, so editing them is counterproductive). However, probability is so related to statistics that I don't see how it can be profitably carved out (even a frequentest needs likelihoods). 018 (talk) 17:19, 11 May 2010 (UTC)
The problem with reviving the project "probability theory" is that it does not have a clearly defined scope either. That is, one cannot tell where the borderline between the probability theory and statistics lies. Many textbooks and university courses are called “Probability theory and statistics”. My view is that the difference between them is purely historical. There used to be the time when the discipline was called “the theory of games and chance” (this name fell out of use nowadays as theory of games is a completely different field), then it was renamed into “probability theory”. Then the people who were working with the data saw the inadequacy of probabilistic approach for their purposes, so they started to build their own models, and that was called the “statistics”. Thus statistics is merely a next step in the development of the probability theory. They are in fact the same discipline. Of course, later on a new generation came and saw the inadequacy of statistical methods, so they started the study of what is now called “econometrics”. This is the same old statistics, only with focus on asymptotic theory and nonparametric and semiparametric models. Again, it is difficult to draw a line between the statistics and econometrics, and in many cases such line would go right through the middle of the article. Thus it’s easier to include econometrics and probability theory within the WPStatistics project, then to have them as separate entities who would never be able to decide which topics are whose.  // stpasha » 
I think there is a reasonably clear distinction between probability and statistics, although of course there is also some overlap. The same could be said for mathematics and statistics, for mathematics and computer science, and for mathematics and physics. Statistics did not develop entirely as an extension of probability theory; it was motivated by quite different problems, and still is. --Avenue (talk) 23:49, 11 May 2010 (UTC)
Well, that’s what I wrote, isn’t it? Statistics was motivated by quite different problems, which probabilists didn’t want to tackle. Still, statistics can be seen as an extension of probability theory. Say, suppose you have a random variable. That’s a purely probabilistic concept. The random variable is characterized by its distribution, so let’s say it has a normal distribution. The normal distribution is probably also the topic in probability theory. Then you note that the normal distributions may have different means or variances, so in fact you have a family of distributions (which already begins to smell like statistics). Now we imagine that there are in fact n iid copies of the same random variables, and this already looks very much like statistics. However if we look at those n copies as an n-dimensional multivariate normal distribution then we are back in the realm of probability theory. having those n random variables we can for example estimate the parameters, which is already definitely statistics. We can conduct the tests, say, the t-test, which is also statistics. However the t-test has the Student’s t-distribution, so it could be the probability theory once again. Then if we take that estimator and start asking inconvenient questions, such as what happens to it asymptotically, what happens if the model was misspecified and the distribution was slightly non-normal, what happens if the data weren’t in fact iid but there were some dependencies? Then we are already firmly on the grounds of econometrics.
Anyways, my point is that in many cases a topic belongs to both the probability theory and statistics, or statistics and econometrics, or statistics and information geometry, etc. So there is no point to have all these disciplines as separate wikiprojects.  // stpasha »  01:30, 12 May 2010 (UTC)
Mathematical statistics can be seen as an extension of probability theory, just as theoretical physics can be seen as applied mathematics. This does not mean that physics=mathematics, or that statistics=probability, because there are important parts of both physics and statistics that do not reduce nicely to mathematical formalism. Anyway, in my opinion that is all beside the point. What matters is how we Wikipedia editors want to organise our efforts. I have no objection to including probability topics within the statistics project's scope if the probability project is moribund, but I would be even happier to see a healthy stand-alone probability project. The question is what active editors on probability topics want. (I do not count myself among them.) --Avenue (talk) 02:11, 12 May 2010 (UTC)
I think that the comments from Stpasha seem to be based on the idea that the articles notionally included in different projects need to be mutually exclusive. there is no idea that putting a project's banner on a talk page "claims" that article for a project. Rather it is an indication that "help" might be obtained on improving the article where the question arises within a projects's particular context, and a facility to those trying to help by providing a way of getting a quick list of changes to the relevant talk pages. Many statisticians would not want to be faced with a list of purely probability theory topics. The attempt to characterise the Statistics project as "probability theory, mathematical statistcs and ecomometrics" is clearly wrong as those are exactly the topics that fit neatly into the "probability and statistics" field of the mathematics project and hence would be excluding all those topics that this project was specifically set up to include (given that the majority of the ecomometrics articles tend to emphasize mathematical content over context and applications). Rather than talking about scope, let's think about purpose ... which really must include, as a first step, trying to improve all those articles marked as stubs (Category:Statistics stubs) and as needing expert attention (Category:Statistics articles needing expert attention). Of course there are also those marked as stubs within the maths project which can be found via the template on the stats project home page, which might not be included in the above. I guess there are similar categories for probability rather than statistics topics. Melcombe (talk) 08:40, 12 May 2010 (UTC)
For info: probability stubs are at Category:Probability stubs, while the cat for expert attention, which is presently empty, is at Category:Probability articles needing expert attention. Melcombe (talk) 15:18, 12 May 2010 (UTC)
I would suggest that the project restrict itself to statistical methods (which is what statistics courses teach) and people who have developed such methods. This would include methods for conducting a census (e.g. Stratified sampling), but not articles like United States Census. Template:Statistics is a good outline. Overlap with Wikipedia:WikiProject Mathematics is inevitable, but a good thing. For statisticians (like Florence Nightingale), overlap with multiple projects is natural.
Melcombe is very wise in suggesting that the focus should be on the practical issue of improving articles, not on the theoretical issue of trying to define exactly what the boundaries of "Statistics" are. Personally, I think that the project already has things mapped out fairly well, and that large-scale changes would be counter-productive. -- Radagast3 (talk) 11:45, 12 May 2010 (UTC)
I'd hope we could be much more expansive than that. Why wouldn't we want to include the US Census article? There aren't many higher profile statistical collections out there. Among other things, I think we should also cover statistical offices, real-world statistical controversies, and notable applied statisticians, even if they haven't developed new methods. And let's not forget statistics education. But I agree that discussing our purpose here will probably be more useful than trying to delineate our scope. I believe improving certain articles (e.g. the bold links in {{Statistics}}) is more vital than expanding stubs, but there's room for a diversity of approaches. --Avenue (talk) 14:49, 12 May 2010 (UTC)
  • There used to be the time when the discipline was called "the theory of games and chance" — that must be games of chance
  • There was no mention that "that the statistical topics lag far behind the math topics", but rather that the content of many Wikipedia articles that tried to talk about statistical techniques was poor are[or?] even wrong. — It may be worth asking which Math project articles are good? and why? Which Stat articles are poor or wrong? For example here are three articles under both Math and Stat project banners. (On the other hand Sampling bias is bannered Stat and Philosophy. Econometrics is bannered Economics only.)
Bayes Law
lift (data mining)
Actuary (also under the Business project banner)
  • Elsewhere I have seen many articles under three project banners such as Ohio, Biography, and Baseball (eg, Charlie Gould). Those projects lie in different dimension; their spaces are orthogonal in some sense. The Math, Stat, Probability (if revived), and Economics projects must be more like currents in one river, all in one dimension.
  • Project ownership "provid[es] a way of getting a quick list of changes to the relevant talk pages." — How does that work? --P64 (talk) 17:20, 12 May 2010 (UTC)
INFO: Having the project banner on a talk page causes articles (their talk pages) to be included in a certain (hidden) category. For a quick asccess to the list of changes use the following web-link, and possibly save it as a "favourite" or whatever ... http://en.wikipedia.org/wiki/Special:RecentChangesLinked/Category:WikiProject_Statistics_articles . There used to be link on the left-hand side of the page when viewing the corresponding category Category:WikiProject Statistics articles that would provie this access but it seems to disappeared for now but that may be some new interface thing ... look for "related changes" in toolbar to left of screen. You might find this useful for the list of stats articles also, as it works for all types of pages. It may be that the project home page already contains links equivalent to these things. Melcombe (talk) 09:03, 13 May 2010 (UTC)
Is there a similar link for changes to main articles tagged with the WPStatistics banner, and not just the talk pages?  // stpasha » 
INFO: No, not that I know of, because only the Talk pages are put into catgories. But it can be used for the categories created by the class= option, but again only for the talk pages. I don't think there is either an automatic way of getting a list of such articles, nor an automatically-maintained last but it seems a possible facility for a banner template to have. The existent of a (possibly private) article containing a list would allow the same feature to be used a create a route to recent changes in main articles. However, the various automatic procedures available for the maths project template do allow such lists to be created for the probability and statistics field. For example, click on the "proability and statistics" heading in the maths project template that appears on the stats project home page ... this creates a several lists of articles on a single page on screen ... then simply click on "recent changes" in the left-hand tool bar. This might all be automated by the single weblink http://en.wikipedia.org/wiki/Special:RecentChangesLinked/Wikipedia:WikiProject_Mathematics/Wikipedia_1.0/Probability_and_statistics . One would need to think about how often that project page (listing articles) is updated ... I think that might be only once a day, so that it might miss some newly-added articles for a day or so. The availability of this facility might mean you want to start adding the maths template to more article talk pages, as well as the stats one. Melcombe (talk) 10:26, 14 May 2010 (UTC)

The placement of the {{WPStatistics}} banner certainly does not claim the “ownership” of the article. Neither it is an advertisement tool for the project. Rather, putting this template claims the responsibility, that is we announce ourselves as taking care of the article and being responsible for its content. The attempt to characterise the Statistics project as "probability theory, mathematical statistcs and ecomometrics" is clearly wrong different from your point of view, but not necessarily wrong. Yes it falls neatly into the mathematics|field=probability and statistics category. Well, maybe not always neatly. But at least I present my point of view that the article such as random variable should belong to this project, whereas such article as gross reproduction rate shouldn’t. Melcombe, it would be nice if you could try to formalize your point of view regarding the scope and not just something tautological like “everything in the Category:Statistics”, since it leaves one to ponder what should actually belong to that category.

From my point of view, a wikiproject is a collection of articles that are closely interrelated and inter-wikilinked, whereas the connection to the articles outside of scope is quite “loose”. As such, it is essential that we include probability theory topics into the scope of the project, since any “purely statistical” article makes many connections to the probability topics.

As for the purpose, we really should work along the quality-importance diagonal. It is more vital to improve top-priority C and Start-class articles than it is to expand Low-importance stubs. Therefore we need to first undertake the task of assessment of existing articles. For example the article standard deviation takes the first place on the list of most frequently viewed statistical articles, yet it is still of C quality. The regression analysis is in top 10, while it is of Start quality. The general level of statistics articles is way lower than that of mathematics articles. Shamefully, we have almost no featured and good articles, and those that we have are of low importance to the project and thus not very representative.  // stpasha »  19:06, 12 May 2010 (UTC)

Again, Bayes Law, Lift (data mining), and Actuary carry the Math project banner. Can you characterize what the Math project has generally done well? Is it articles that shy away from application? themes which lack known application? (It isn't Lift (data mining)!)
Does the Math project provide lessons that are generally useful? or useful only for a Stat subproject that is naturally a Math subproject too? --P64 (talk) 19:57, 12 May 2010 (UTC)
I'm unable to characterize anything about the Math project, I'm not that much familiar with it. I know though that they have quite a handful of good and featured-quality articles, since I stumble upon those from time to time. For example: e, π, derivative, matrix, Mandelbrot set, etc. I agree that probably every single math article can be improved, some significantly, others not as much. But what does this all have to do with our subproject? I thought initially that yours was a rhetoric question, but now I'm confused regarding what your actual position is...  // stpasha »  21:50, 12 May 2010 (UTC)
Stpasha, thanks for your generalisation of my remark about some articles being a higher priority to improve than our many stubs; I agree. Regarding project scope, I'm not sure what your reasons are for saying that articles such as gross reproduction rate should be outside our scope. Is it because demography is to some degree a separate discipline? If so, would you also apply this to other somewhat separate subjects such as actuarial mathematics, psychometrics, biostatistics, and probability? All these areas have some statistical content. Personally I think we should take responsibility for the bits that have significant statistical aspects, and leave aside the rest. For example, I think the Cauchy distribution is within our scope, but not its generalisation as the relativistic Breit–Wigner distribution. --Avenue (talk) 00:19, 13 May 2010 (UTC)
There comes the danger of confusing statistic and statistics. If you include the gross reproduction rate, next thing you know you’ll have to include gross domestic product and stock market index (those are always modeled as random variables), next thing you know there will be weather, Sun spots and astrology, and when you think the things can’t get any worse, some football fan will come in and include the 1000+ articles on every player’s and all games’ statistics of all seasons (simply because they have the word “statistics” in them). In my opinion the project should be about the random variables in general, but not about any random variable in particular. I don’t have anything against psychometrics, biostatistics, etc., as long as they are no mere applications of the statistical theory but have some theoretical advancements in them too.  // stpasha »  04:04, 13 May 2010 (UTC)

Just where do you get the idea that "putting this template claims the responsibility, that is we announce ourselves as taking care of the article and being responsible for its content" ? There is no such implication. (Just as there is no implication that help will actually be provide at the wikipedia maths help desk.) Why would you dream up such an implication? Why would you think that the project needs to have a tightly defined scope? I pointed to the "stub" and "attention" categories because someone has identified each article as needing help in the area of statistics ...and one of the purposes of the project must be to help to improve such articles. But more generally the project needs to pay attention to, but not necessarily place its banner on, any article that any statistician (one would do, not necessarily all) would say "they're doing statistics there". There is certainly no need to restrict attention to only those articles that fit into the scope of the probability and statstics part of the maths project. If you think that this group of articles need some sort of concerted effort then the obvious course would be to set up a "task force" within that project, as already mentioned above. Once again, you need to be reminded that this project was specifically set up so that its coverage was far wider than you want, and specifically to include articles that contravene your requirement, stated above, "as long as they are no mere applications of the statistical theory but have some theoretical advancements in them too". There was certainly no idea that the project should concentrate on "theory" or be only about theory... that is what the maths project concentrates on, and what was found to be too restrictive. Melcombe (talk) 09:39, 13 May 2010 (UTC)

Yes, I feel strongly that our scope is broader than statistical theory alone, and that "mere applications" should be included, at least when statistical aspects of the topic are significant. I'm not saying that we should include any city article that gives statistics on its demographics, or every article on a baseball player that mentions how many home runs they scored, but articles on general topics like gross domestic product, life expectancy and batting average seem relevant to me. Articles on particular series such as the Dow Jones Industrial Average are borderline IMO. Conversely some aspects of probability theory have no relevance to statistics, so I see no reason to include them. --Avenue (talk) 12:57, 13 May 2010 (UTC)
It is the borderline cases that would be most affected by trying to have too definite a scope. We need not treat all articles in the same way for the various things we try to do. For the example of Dow Jones Industrial Average and similar, I would consider that we could well try to find it a home in some category that would link through several layers of categories to the home statistics category, so that it is findable via such a route, but would not think that the project should try to do more than this. I don't know if it is already accessible in this way, but the general set of sports statistics articles certainly are. Thus one of the things the stats project has already been doing is to "organise" the statistics categories into a sensible structure, coverage and content, without necessarily thinking that we are involved in the content of all the articles contained. Melcombe (talk) 14:16, 13 May 2010 (UTC)
Just on the Dow Jones Industrial Average article: it is part of Category:Statistics, six layers down. The other aspect of the article that might be within our scope is its Criticism section, which I think does a reasonable job of explaining the drawbacks of a price-weighted index versus a capitalisation-weighted one. --Avenue (talk) 18:12, 13 May 2010 (UTC)
Alright, I concede. If you want to include all topics that are of some interest to at least some statisticians, then so be it. But then we also have to include the probability and measure-theoretic topics as well, since those are of interest to statisticians-theorists. Even if it means the π–λ theorem. And we definitely have to include econometrics, since many statisticians are working in that field now.  // stpasha »  19:22, 13 May 2010 (UTC)
Ah, but would your "statistician-theorist" not actually be acting as a "probability-theorist"? I think the actual point is that what we all want to have is a way of getting to a collection of articles that might be of interest to us, with different people being interested in different collections. Which is why having a usable way of dividing up the articles of interest to statisicians (and probabilists) would be so valuable. Melcombe (talk) 10:38, 14 May 2010 (UTC)

Task forces? or fields?

Regarding the subdivision of the project into task forces, this can be feasibly achieved by implementing the following proposal from one year ago, which got no responses at the time:


I'm not sure if the word “statistician” is appropriate. Some of the people on List of statisticians, such as Abraham de Moivre died before the statistics even emerged. Other people, like Carl Friedrich Gauss are not on that list however, so maybe de Moivre got there by a mistake? Some of those people certainly seem un-statistician-like, for example Kenneth Arrow… Others, such as Takeshi Amemiya like to be called econometricians rather than statisticians.  // stpasha »  21:57, 13 May 2010 (UTC)

Some time ago I left an item on the talk page for the underlying template used for the banner making enquiries about this. I can't find this now and there is the possibilty that we are now using a different underlying banner shell. At the time the response was that the "field=" thing was being looked into, and that the lack of commonality of the "class=" was a known point as the responder had also managed the meths template scripts. This latter point has been dealt with as the maths template now accepts classes that didn't work properly previously. There are a vast array of options available for the banner template, but I don't think there is something equivalent to the "field=" that the maths project has. For those unfamiliar, its use allows the categories holding the project articles to be split up according to both "class" (as now) and "field". I don't think we should expect quick action in response to another request for a "field=" option as it may not be a priority. I note also that the maths template has an additional option whereby articles can be separately marked as having special historical content... would this be useful here? I don't know what effect the option actually has.
The question of tagging articles as "statisticians" arises partly because of the wikipedia convention of not wanting to have "people" articles mixed in categories with "topic" articles, but also because (as others have said) that many will not want to bother looking through biographies. The maths template has "field=mathematician" which is clearly suitable and appropriate for some statisticians, but certainly not all, which is why a separate tagging within this project might be helpful. As to who counts as a "statistician"... the obvious answer in this context (tagging as being covered by the Statistic project) is that it should be that anyone who someone working on this project thinks counts as a statistician can be included. Obviously this discussion is pointless unless we either have a "field=" equivalent available, or decide to adopt separate banners for different purposes. The question of who should be included in List of statisticians is a different matter and people has been added and deleted according to the usual whims. The inclusion of historical characters you mention may be partly explained by their presence in other articles related to history of statistics.
But the question of "task forces" is in no way equivalent to this "field=" facility. Rather it is a grouping of people who agree to push forward with progress on a group of articles (which is what you seem to be wanting to do), with some structure from wikipedia to formalise this grouping and help with coordination. Since the Stats project has no history of such task forces while the maths project does, and since the maths project fully covers the collection of subtopics you are interested in, it would be best to set up an appropriate task force under the maths project.
Melcombe (talk) 09:41, 14 May 2010 (UTC)
The {{WPBannerMeta}} template used in our project has the functionality of hooks, which can be used to add the field parameter. So if we can come up with a list of fields, the functionality could be implemented. Of course, then we would have to “teach” the bots about the new field field and how to handle it; however we need to come up with the list first. It could be something like:
 • Probability  • Statistics  • Econometrics  • Biography  • History  • Journal  • Software  • Data  • Measure  • Institution
“Measure” means a statistic, such as the GDP article; “Data” is for famous datasets, such as census; “Institution” is meant for statistical organizations or institutions; “History” is for the articles with the historical content in them, such as normal distribution. The field parameter should obviously be multi-valued. Suggest your own list of fields!  // stpasha »  16:52, 14 May 2010 (UTC)
This is beginning to seem very ornate. What problem are we trying to solve here? My feeling is that tagging different subfields is unnecessary unless there is an active group of editors focussed on that subfield, or perhaps when a subfield can be easily identified that many project members are not interested in (e.g. biographies?). --Avenue (talk) 22:51, 14 May 2010 (UTC)
It would inevitably be necessary to limit this to a reasonable number of different fields, and I think the maths project uses about 10. Regarding the above list, I think it would be better to use plurals, in the same way as category names do. There is a question as to whether an article could be placed in more than one category as possibly implied by your comment about the normal distribution ... the maths one doesn't seem to allow this, which is why they have a separate "historical" tag. I think the names would need to be more obvious than those suggested, and at least should be somewhat alligned with existing category names: eg. "organisations" rather than "institutions". Melcombe (talk) 09:36, 17 May 2010 (UTC)

Binary classification issues need to be cleaned up...

There are multiple terms for several concepts related to binary classification. However, the articles in question are often very dependent upon a specific domain. It's unclear whether the articles in question describe a domain-specific term for a more general concept or merely a different, domain-agnostic term. For instance, positive predictive value focuses solely on the field of medicine, but I believe that to be a concept applicable to any binary classification scheme. Someone doing generic predictive analytics (in the realm of machine learning) may also use this same concept. Many times it is called precision. However, the article on precision distinguishes between the generic definition in classification and the specific domain of information retrieval. This seems to me to merely be one use of the concept of precision rather than a distinct concept. I think a lot of effort should be devoted to cleaning up these and similar articles to clarify these and related issues. Mickeyg13 (talk) 18:43, 18 May 2010 (UTC)

New category on educators

A new category Category:Statistics educators was been created. I am not sure of the intended scope of this, but a higher level category states "Educators are people involved in the practice of education. For people involved in the theory of education see Category:Educationists and its subcategories." you might want to make additions accordingly. Note that Category:Statisticians is some layers down from Category:Scholars by specialty or field of research and Category:Scholars and academics by subject, so there is a distiction made between education and acedemia. Melcombe (talk) 12:02, 18 May 2010 (UTC)

So what is the criterion to establish notability among educators? And how do you plan to deal with the fact that most scientists who work in academia also teach some courses, which makes them “educators”?  // stpasha » 
Our class of statistics educators should (imho) inherit the criteria from the category of mathematics educators. Apart from Larry Hedges, the members of this category have been involved with (first) issuing major reports on statistics education, like Adrian Smith and Fred Smith or (second) have been nationally or internationally recognized for their work in stats ed. Look at the interview with Bob Hogg to see the praise of David S. Moore and Fred Mosteller, for example. Thanks. Kiefer.Wolfowitz (talk) 20:46, 18 May 2010 (UTC)
I don't think Larry Hedges belongs in Category:Statistics educators as he's not (AFAIK) involved in statistics education any more than any other statistics professor who lectures and writes textbooks, so i'm going to remove him. (his webpage describes him as "A national leader in the fields of educational statistics and evaluation", but here "educational statistics" means the statistics of education, not statistics education.) Qwfp (talk) 21:37, 18 May 2010 (UTC)
There is no formal statement of criterion on Category:Mathematics educators and it seems a matter of interpretation as to why certain people are there. The only criteria I see is the one stated at the top of this thread which was from Category:Educators. I am certainly unclear about whether the work of Adrian Smith and Fred Smith that is mentioned above would make them count as being suitable for Category:Educationists. Of course a given person can be in several "people categories". I picked at random Gwynneth Coogan from Category:Mathematics educators and found that the apparent reason for inclusion was that she has taught maths at 2 schools. I believe there is no actual requirement of notability within a category for inclusion in that category, just for the main article itself (which in this case was satisfied by having been at the Olympics). So it seems that the criterion might have extend to anyone who teaches statistics, even school teachers ... and why not? If someone wanted a stricter category they could create one. I didn't create this category, so there may have been some clearer view of who should be included in it, but some guidance must be taken from the higher level categories. Melcombe (talk) 09:33, 19 May 2010 (UTC)
I wrote the following description of the category:
"This category collects statisticians (and other scholars and practitioners) who have made notable contributions to statistics education, i.e. the teaching and learning of statistics (including both theory and practice)."
"Statisticians involved in producing educational statistics or evaluating education may also be included but only on the basis of notable contributions to statistics education."
The phrase "statistics educator" is used (in the USA at least) for scholars active in the research on teaching and learning statistics, including curriculum, didactics, pedagogy, etc. This restrictive meaning is preferred (imho) over the wider meaning of the super-categories. Thanks Kiefer.Wolfowitz (talk) 10:44, 19 May 2010 (UTC)

Statistics navigation template

There is presently a discussion of making radical changes to the way in which the statistics navigation template might appear or be structured. See Template talk:Statistics and add your thoughts/experience. Melcombe (talk) 10:20, 26 May 2010 (UTC)

Proposed deletion of "ProFicient"

Deletion of the article titled ProFicient has been proposed ... this relates to a statistical process control package and is very new. The discussion is here. Don't just say Keep or Delete; give your arguments. Melcombe (talk) 16:15, 4 June 2010 (UTC)

Proposed deletion of "Generalized additive model for location, scale, and shape"

Deletion of the article titled Generalized additive model for location, scale, and shape has been proposed. The discussion is here. Don't just say Keep or Delete; give your arguments. Michael Hardy (talk) 17:18, 23 May 2010 (UTC)

The discussion was closed with a consensus decision to keep the article, following some editing (addressing concerns about possible appearance of original research or self-advertising). Kiefer.Wolfowitz (talk) 14:36, 10 June 2010 (UTC)

Wikipedia talk:Articles for creation/Application of cluster analysis in educational psychology

A new Wikipedian, Jucypsycho (talk · contribs), has just joined this project - and could use some advice, re. Wikipedia talk:Articles for creation/Application of cluster analysis in educational psychology - so please, comment on their talk page! Cheers,  Chzz  ►  04:28, 11 June 2010 (UTC)

Cluster Analysis in Education

I have written an article on application of cluster analysis in education. http://en.wikipedia.org/wiki/Wikipedia_talk:Articles_for_creation/Application_of_cluster_analysis_in_educational_psychology#Careful_use_of_Cluster_Analysis

What do you think? —Preceding unsigned comment added by Jucypsycho (talkcontribs) 08:26, 11 June 2010 (UTC)

Normal variance-mean mixture

In the article about normal variance-mean mixture it is stated

In probability theory a normal variance-mean mixture with mixing probability density   is the continuous probability distribution of a random variable   of the form

 

where   and   are real numbers and  . The random variables   and   are independent,   is normal distributed with mean zero and variance one, and   is a continuous probability distribution....

I am no statistician, but it feels to me that it should rather read

.... and   has a continuous probability distribution....

I leave this issue to the experts.

160.45.24.185 (talk) 15:54, 14 June 2010 (UTC)

Response Surface Methodology

I was reading through the "Response Surface Methodology" page, and I noticed that there was no section for the application of Response Surface Methodology in video games.

I plan on using the following source to add a section regarding the aforementioned topic: An application of Response Surface Methodology to the Atari Miniature Golf Video Game by Charles W. Kish, Jr., and Walter H. Carter, Jr.

Are there any objections to this addition? —Preceding unsigned comment added by Alanz-njitp (talkcontribs) 18:44, 17 June 2010 (UTC)

Well, it was published in The American Statistician (JSTOR 2683428) so it's a WP:reliable source. However, I doubt that video games are a particularly major application of response surface methodology, and that article is pretty short at present, so i'm not sure this particular application deserves more than a mention at most... Qwfp (talk) 20:36, 17 June 2010 (UTC)
One the one hand, I agree with editor Qwfp. Maybe you could add an example of baking a cake or chemical engineering and then add the video-game example. On the other hand, it is an interesting example and might interest the youth! Kiefer.Wolfowitz (talk) 20:59, 17 June 2010 (UTC)
It sounds like this would be worth mentioning, but not so much as to create a whole section on it...Perhaps it would be more appropriate to put in the Video Games article? Maybe that would be a little more relevant, considering it was a huge step for video games to include such a method.Alanz-njitp (talk —Preceding undated comment added 18:12, 21 June 2010 (UTC).
Are we talking at cross purposes? That article in The American Statistician isn't about the use of RSM in designing or programming video games, it's about the use of RSM to help play a video game (specifically, to get a hole in one in an early golf game). I don't think the article is intended to be taken entirely seriously; The American Statistician's Aims and Scope include "interesting and fun articles of a general nature about statistics and its applications". If RSM is included in some video games you need a different reference! Qwfp (talk) 18:28, 21 June 2010 (UTC)

Complexity and tone of statistics articles

Hi, sorry to interrupt, but I thought I might offer some constructive feedback from a non-statistician. The articles on Wikipedia about statistics are remarkably detailed and cover a large range of topics, which is really good. The problem is that they're often written in a highly technical language which makes them difficult to understand for people who don't have a background in statistics.

Obviously some of this is be expected and you've got to assume some background in the subject matter. You can't discuss everything from first principles in every article and you can link to other articles to help build a foundation of understanding. My feeling is that you should write an article a couple of conceptual layers below the concept you're trying to describe. Then you can build your way up to the concept and make sure you've brought your reader with you.

I think definitions such as the one in posterior probability should be rephrased into common English. The formal mathematical definition is great, but it's important to give readers an intuitive understanding of the topic as well as provide the formality for readers with the requisite background. (For example that definition uses a loop symbol that comes with no translation or link for further information.)

Have a look at many of the articles on physics if you want to see how mathematical topics can be covered in a way that lay people can understand. From reading articles on Wikipedia I feel that I have an intuitive understanding of quantum mechanics, but I've read many of the statistics articles over and over and still have no feeling for what -- for example -- a conjugate prior is.

111.69.251.147 (talk) 01:00, 21 June 2010 (UTC)

Thanks for the feedback; you're not interrupting anything. I think what you say about the more mathematical of the statistics articles is true about mathematics articles too, and the same criticism is often heard about them. Wikipedia is not a textbook, and often isn't a great way of learning mathematical material. On the specific articles you mention:
  • I agree posterior probability isn't a good article. I'm not sure I've ever looked at it before. Looking at it now, I think it could be merged with, or simply redirected to, Bayesian inference, which is a much better article.
  • Conjugate priors are an inherently mathematical concept. I can't see any way around that. But to be honest, if you can't follow the mathematical definition, it's highly unlikely you need to worry about what a conjugate prior is. (To follow your analogy with physics, the same might be said about the Schwinger–Dyson equation, say). Their practical importance to Bayesian statistics has greatly decreased in the last 20 years with the growth in computing power and the increasing use of Monte Carlo simulation-based approaches, in particular Markov chain Monte Carlo, as an alternative to formula-based approaches. Qwfp (talk) 19:17, 21 June 2010 (UTC)
I agree with the first friendly suggestion. I have several introductory-level (AP or first college textbook) statistics books at home, as well as some popular treatments of the subject, and I am steadily self-educating. But when I want to verify a detailed point as I do research on psychology, I still find it hard to find Wikipedia articles that are written in approachable language. So many Wikipedia articles on so many subjects could be improved if the statistical errors in the articles were weeded out, so the editors involved in this WikiProject Statistics can make a great contribution to the whole encyclopedia by editing statistics articles to be as approachable as possible to college-educated persons who have only a beginning knowledge of statistics. Thanks to all of you for your efforts in this direction. -- WeijiBaikeBianji (talk) 21:29, 30 June 2010 (UTC)

Source List for Which I'd Like Input from Statisticians

I have posted a bibliography of Intelligence Citations for the use of all Wikipedians who have occasion to edit articles on human intelligence and related issues. I happen to have circulating access to a huge academic research library at a university with an active research program in those issues (and to another library that is one of the ten largest public library systems in the United States) and have been researching these issues since 1989. What I think I need the most help on, and what I think most Wikipedia articles on those issues need the most help on, are specifically statistical issues. I think even very sophisticated psychologists sometimes make rather elementary errors in statistics. You are welcome to use these citations for your own research and to suggest new sources to me by comments on that page. -- WeijiBaikeBianji (talk) 21:31, 30 June 2010 (UTC)

A new "prod" notice

The article Sampling equiprobably with dice has had a prod (deletion in 7 days) notice inserted, apparently by the original contributor. This article be of interest to some here, but seems badly written. Melcombe (talk) 14:09, 8 July 2010 (UTC)

error?

Mathematically, wouldn't Xi need to be set to 0 (not 1) for all i in order to yield an intercept? —Preceding unsigned comment added by 165.151.51.75 (talk) 22:45, 2 August 2010 (UTC)

Sorry, does that relate to some particular article? If so, which one? Qwfp (talk) 06:39, 3 August 2010 (UTC)

Expert opinion needed at Snyderman and Rothman (study)

We'd greatly appreciate feedback from somebody knowledgeable in statistics about the construction of the Snyderman and Rothman study: Snyderman_and_Rothman_(study). There have been lingering doubts about the construct validity of the survey; therefore an expert opinion would probably help move the conversation along. Thanks in advance.--Ramdrake (talk) 04:36, 11 August 2010 (UTC)

I agree that knowing what statistics theory and practice says about how the study was conducted would help greatly in making sound editing decisions about the article. -- WeijiBaikeBianji (talk) 05:15, 11 August 2010 (UTC)

hyperbolic confidence band?

If we look to the equation the confidence band would rather be linear in my opinion. There is no pole in this equation. There is a root taken from a quadratic expression..

Regards, Dieter De Witte —Preceding unsigned comment added by 157.193.214.162 (talk) 14:53, 23 August 2010 (UTC)

Sorry, does that relate to some particular article? If so, which one? Regards, Qwfp (talk) 06:39, 3 August 2010 (UTC)

Biased Coin and Urn model experimental Designs

I recently was searching through wiki for articles on Biased Coin Design Studies and Urn Model Studies (an extention thereof), and realised there wasnt any, so i was hoping to start articles on these topics, i am an absolute begginer at making and/or editing wiki articles, so any and/or all help/advice would be very helpful, i.e. how to go about doing this, where to create the articles, how to link them to other existing articles (experimental design, etc.), etc. But i also wanted to ask if maybe there where already pages on these topics, and maybe i just failed to find them? if this is the wrong place to make this post please let me know. Thanks alot.

Armadilloa (talk) 06:14, 10 September 2010 (UTC)

Deletion proposal

A new article, Multifactor design of experiments software, has been proposed for deletion. See and contribute to the discussion at Wikipedia:Articles for deletion/Multifactor design of experiments software. Melcombe (talk) 13:35, 13 September 2010 (UTC)

Note that the result was "Keep". Melcombe (talk)

What is an histogram?

There is a dispute between me and User:Nijdam on the definition of histogram. To me, an histogram is a mathematical method to estimate a distribution, which is usually plotted with a bar chart. To Nijdam is considered "a diagram". I personally think (and have been taught) that the identification of "histogram" with "bar chart" is a common misconception, but I am having a bit of a hard time finding a positive reference on Gbooks (I can't go to a library today since I am at home with a cold). Can someone knowledgeable here come and help untangle the dispute? Thanks! --Cyclopiatalk 13:40, 6 September 2010 (UTC)

For a dataset D with N elements Di, a histogram plots the frequency F of each Di residing in the intervals Ii of the set I. The limits of I and D are equal. Relative frequency histograms are also used which plot F/N in place of F. For clarity, I carries the same units as D, but F and F/N are unitless. Hope that was bookish enough for you. SEB —Preceding unsigned comment added by 63.245.15.11 (talk) 20:30, 17 September 2010 (UTC)

I beleive a histogram, and a bar chart, are both plots, in the sense that they are ways of visualising a given dataset, or looking at its distribution, although neither give any analytical information on the distribution of the data, only qualitative information. As to the difference between the two, a histogram is typically a more rigourously defined thing, consisting of equally spaced intervals with the end of each interval being the start of the next, while a bar chart can be a little more open to variations, with the 'bars' being seperate, and open on un-even intervals, also often used for catagorical data. Armadilloa (talk) 06:21, 10 September 2010 (UTC)

Thanks, however I'd prefer to see some reference from a book of mathematical statistics. --Cyclopiatalk 12:29, 10 September 2010 (UTC)

Statistics articles have been selected for the Wikipedia 0.8 release

Version 0.8 is a collection of Wikipedia articles selected by the Wikipedia 1.0 team for offline release on USB key, DVD and mobile phone. Articles were selected based on their assessed importance and quality, then article versions (revisionIDs) were chosen for trustworthiness (freedom from vandalism) using an adaptation of the WikiTrust algorithm.

We would like to ask you to review the Statistics articles and revisionIDs we have chosen. Selected articles are marked with a diamond symbol (♦) to the right of each article, and this symbol links to the selected version of each article. If you believe we have included or excluded articles inappropriately, please contact us at Wikipedia talk:Version 0.8 with the details. You may wish to look at your WikiProject's articles with cleanup tags and try to improve any that need work; if you do, please give us the new revisionID at Wikipedia talk:Version 0.8. We would like to complete this consultation period by midnight UTC on Monday, October 11th.

We have greatly streamlined the process since the Version 0.7 release, so we aim to have the collection ready for distribution by the end of October, 2010. As a result, we are planning to distribute the collection much more widely, while continuing to work with groups such as One Laptop per Child and Wikipedia for Schools to extend the reach of Wikipedia worldwide. Please help us, with your WikiProject's feedback!

For the Wikipedia 1.0 editorial team, SelectionBot 23:40, 19 September 2010 (UTC)

lAPLACE DISTRIBUTION

how to get the final foumela of the laplace distribution mode —Preceding unsigned comment added by 41.254.3.118 (talk) 20:24, 21 September 2010 (UTC)

Basic error in calculating Variance

On the page http://en.wikipedia.org/wiki/Computational_formula_for_the_variance the formula for variance in incorrect.

It is shown as Var(X) = E(X^2) - (E(X))^2 whereas it should of course be Var(X) = (E(X^2) - (E(X))^2/N) / N for the variance relative to the sample mean and Var(X) = (E(X^2) - (E(X))^2/N) / (N-1) for the variance relative to the population mean.

(The same error occurs in my son's school textbook. They omitted the first /N. Perhaps this error is being propagated through textbooks.)

I'm very new so I don't feel capable of editing pages yet myself! But I'll have a go if no-one else wants to. Can anyone tell me how to start?

For reference (and from memory), the proof is

Var(X) = u / (N-1)

where

u = E(X-m)^2 m is the mean of X

So

u = E(X-m)^2 = E(X^2 -2mX + m^2) = E(X^2) -2m(EX) + N(m^2) = E(X^2) -2(EX)(EX)/N + N(EX/N)^2) = E(X^2) -2(EX)(EX)/N + N(EX)(EX)/N/N = E(X^2) -2(EX)(EX)/N + (EX)(EX)/N = E(X^2) - (EX)(EX)/N

So

Var(X) = (E(X^2) - (EX)(EX)/N) / (N-1)

And, by a similar proof

Cov(X,Y) = (E(XY) - (EX)(EY)/N) / (N-1)

Peterbalch (talk) 16:47, 20 September 2010 (UTC)

Peterbalch, generally, Wikipedia tries to follow what is in textbooks, even if the textbooks are wrong (read more in Wikipedia's verifiability policy). If you want to update the formula, generally you should be able to find a textbook that agrees with you.
In this case, in your proof you assume E(m2)=Nm2. But E(c)=c for any constant. This assumes you are treating the mean as a constant in the calculation. 018 (talk) 16:56, 20 September 2010 (UTC)


Yes, I am treating the mean as a constant in the calculation because it is a constant in the calculation.

If you wish verification, consider the following:

Assume that the samples are 3,3,3,3. The values are all the same so the variance is zero.

If we assume that

Var(X) = E(X^2) - (EX)^2

as the article states then

EX = 12

E(X^2) = 36

Var(X) = 36 - 12^2 = -108

A negative variance is clearly a ludicrous result.

You will find that the formula I gave gives the correct result.

Are you saying that if a formula can be mathematically demonstrated to be false then you will refuse to allow it to be corrected until you are shown a different textbook? Are we to have a battle of "well my textbook says ...". I can refer to the stats textbook I used in 1973; would that be sufficient?

I guess I don't understand the rules.

Peterbalch (talk) 20:12, 20 September 2010 (UTC)

Peterbalch, I think you would be better taking your questions to the math reference desk. 018 (talk) 20:40, 20 September 2010 (UTC)
In your case, E(X) is 3. E(X) is the expected value. Sum(X) is 12, but E(X) is Sum(X)/n. So, E(X^2) = 9, E(X)² = 9, so Var = 0, as it should be. -- Avi (talk) 20:43, 20 September 2010 (UTC)
Be careful there, E(X) is a population parameter, it can only be estimated from a sample. This is why I sent him to the reference desk--there are many different levels of confusion. 018 (talk) 20:55, 20 September 2010 (UTC)
In his case, the population was explicitly defined as {3, 3, 3, 3}, so E(X) = 3, no? -- Avi (talk) 20:58, 20 September 2010 (UTC)
Moreover, the initial function in the article he references is discussing the variance and expectation of the random variable itself, not the estimates from data, which may be part of the confusion as well. As it says "In probability theory and statistics, the computational formula for the variance Var(X) of a random variable X is the formula…where E(X) is the expected value of X." -- Avi (talk) 21:00, 20 September 2010 (UTC)
This is actually an interesting question. The article talks about Var(X) as well as the s2. Which is it about? I think that is worth a discussion on the talk page. 018 (talk) 21:10, 20 September 2010 (UTC)
The next formula is a "closely related identity" which "can be used to calculate the sample variance." There are formulæ for both, but the two should not be confused. -- Avi (talk) 21:57, 20 September 2010 (UTC)
I think he confuses symbols and E, which may look similar to someone unfamiliar with Greek alphabet.  // stpasha »  01:00, 21 September 2010 (UTC)
The confusion starts by having an article entitled "Computational formulae ..." when really it is about algebraic formulae ... algebra to use in finding a formula for a variance, rather than computational steps that are sensible to implement on a computer. What the OP is really concerned with may be better answered in the articles variance and/or Sample mean and sample covariance, and there may be others. Melcombe (talk) 08:42, 21 September 2010 (UTC)
Melcome, go check out the article and what links there. I also started a discussion about what the page is supposed to be about (is it the sample variance or the variance of a random variable?) I think computation is meant to be, "how you compute something," so differential calculus would qualify as a method of computing the rate of change in a function with respect to its argument. 018 (talk) 02:09, 24 September 2010 (UTC)
Go look up the meaning of "compute" and the origins of the word "computer". To compute something is not the samne as "find a formula for". Melcombe (talk) 08:37, 24 September 2010 (UTC)

Deletion proposal

I have placed an AfD template on Gaussian minus exponential distribution. Please see Wikipedia:Articles for deletion/Gaussian minus exponential distribution and comment,etc.. There was discussion previously on this page about this article, now archived, but really only to say that it is poor. Melcombe (talk) 15:49, 23 September 2010 (UTC)

Multivariate kernel density estimation

The article on multivariate kernel density estimation is very new but very good. Could someone who is more familiar than me with the rating of statistics articles take a look at it and give it a rating? I have a feeling it may be a B-class article, but I could be wrong. Yaris678 (talk) 07:44, 24 September 2010 (UTC)

The article seems pretty good, however why oh why it has been split out from the kernel density estimation ? I suggest that the two articles were merged, since they are using essentially the same method.  // stpasha »  15:58, 24 September 2010 (UTC)

another deletion proposal

I have placed a PROD template on Logmoment generating function, which seems to duplicate cumulant generating function. This is your chance to save it if anyone thinks it's worthwhile. Melcombe (talk) 09:17, 1 October 2010 (UTC)

Work on redoing intros of articles to make them clearer and less technical

Hello. Many of the statistics articles used to be very confusing, and far too technical. I have a personal interest in fixing this because my statistics knowledge has been hard-won, and too often in the past when I looked up a relevant Wikipedia article I found it impossible to make sense of. So I've tried to rewrite the intros of a number of articles to make them clearer and less technical. Among the statistics-related articles so far whose intros or description sections I've redone or significantly hacked on are:

If anyone sees any further work that needs to be done to the above articles, or notices any other statistics articles that are confusing and need work, please note this below. Thanks. Benwing (talk) 09:33, 5 October 2010 (UTC)

Merge?

I have proposed merging these three articles:

Michael Hardy (talk) 02:11, 11 October 2010 (UTC)


In turn, I will propose to merge other three articles:

into the parent article Probability distribution.  // stpasha »  01:30, 12 October 2010 (UTC)

rich get richer

We now have:

How should we organize links between these, and from other pages? Michael Hardy (talk) 15:43, 27 September 2010 (UTC)

I suggest merging The rich get richer (statistics) into Preferential attachment, since these are both about the same phenomenon in statistics. I take the blame for having created The rich get richer (statistics), since I didn't realize the preferential attachment article already existed. As for the Matthew effect, it seems to refer more to a phenomenon in sociology than statistics, so I'd keep it separate, possibly adding a sentence indicating that it's sometimes used to refer to preferential attachment in statistics. BTW as for as the phenomenon of the "Matthew effect" in the history of science, I recently saw another article about someone or other's law that stated substantially the same thing, i.e. discoveries are rarely named after the person who discovered them (but usually someone else with high visibility and social standing). Benwing (talk) 09:37, 1 October 2010 (UTC)
It seems that The rich get richer (statistics) is rather different from Preferential attachment, since in the latter new observations tend to be close to previous ones because the probability model governing them mandates that, but in the former all the observations are independent under the primary model, and it is something to do with conditional distributions, marginalised over distributions for parameters, that shifts. While something like "Preferential attachment" might be going on here, the repeated marginalisation over unknown parameters that seems needed to present it as a "preferential attachment process" seems rather strongly different from the model described in Preferential attachment. After all from one point of view the observations just cluster in an iid way about an unknown point. Of course there are no references so no-one can check what is supposed to be going on in the limited context described by The rich get richer (statistics), and what is described there is rather different from what a statistician would typically mean by "the rich get richer"... who would presumably think of The rich get richer and the poor get poorer and they might be prepared to start formulating a probabilistic model for it. As it stands, what is in The rich get richer (statistics) might be better off (if it is worthwhile at all) as a sub-sub-section in some article on computational Bayesian statistics. Melcombe (talk) 13:28, 19 October 2010 (UTC)

Help for Piecewise regression analysis

If anyone is interested in progressing Piecewise regression analysis, please see that article's Talk at Talk:Piecewise regression analysis. Melcombe (talk) 09:16, 5 October 2010 (UTC)

Given no real progress with this article, I have placed an Afd on ot, with a suggestion to "userfy" rather than delete. But please form your own opinions an contribute at Wikipedia:Articles for deletion/Piecewise regression analysis. Melcombe (talk) 12:54, 19 October 2010 (UTC)

N = 1 fallacy

FYI, N = 1 fallacy has been nominated for deletion. 76.66.200.95 (talk) 05:27, 9 October 2010 (UTC)

Good riddance :)  // stpasha »  06:58, 9 October 2010 (UTC)
Discussion is at Wikipedia:Articles for deletion/N = 1 fallacy. Melcombe (talk) 09:07, 11 October 2010 (UTC)
Note the result was that a poor simple redirect to Pseudoreplication was done. Melcombe (talk) 12:57, 19 October 2010 (UTC)

Linear Least Squares

FYI, the usage of Linear least squares is under debate, see Talk:Numerical methods for linear least squares.

76.66.198.128 (talk) 03:53, 21 October 2010 (UTC)

WikiProject cleanup listing

I have created together with Smallman12q a toolserver tool that shows a weekly-updated list of cleanup categories for WikiProjects, that can be used as a replacement for WolterBot and this WikiProject is among those that are already included (because it is a member of Category:WolterBot cleanup listing subscriptions). See the tool's wiki page, this project's listing in one big table or by categories and the index of WikiProjects. Svick (talk) 20:54, 7 November 2010 (UTC)

yellow

Many peoples dont known why their look this pagr but at this content i am stil soryry about yoy who is stiil cheaking —Preceding unsigned comment added by 88.119.227.60 (talk) 14:48, 10 November 2010 (UTC)

Help for Levy's convergence theorem

Please see if you can help in the discussion at Talk:Lévy's convergence theorem. There are issues about citations using that name for what is presently in the article, about its differnce from Dominated convergence theorem, and a different meaning at Lévy's continuity theorem which is sometimes referred to as Lévy's convergence theorem. 17:35, 10 November 2010 (UTC)

Skewness

The Wikipedia article on Skewness cites reference #14 (in Czech) conerning Cyhelsky's Skewness Coefficient. However, the formulation as [(the number of observations below the mean minus the number of observations above the mean)/total number of observations] yields a negative value for a right-sided skew, which is commonly described as a "postive" skewness. I don't read Czech. Perhaps someone who does read Czech could "check" to see whether the Wikipedia formulation should be revised to result in a negative coefficient when the data show a left-side skew. Thinners (talk) 22:20, 27 October 2010 (UTC)

That's what the reference says. It specifically says that when the coefficient is negative, there is more values above the average than there is values below the average. Svick (talk) 21:13, 13 November 2010 (UTC)

Suggestion to the highly educated from a dilettante

I realize that this request, if generously granted, would mean more work for everyone, but it would be quite informative to see additional steps between simple and general cases for various formulas. If possible, could someone provide, for example, a three and/or four variate case in the article on joint probability density functions? It is sometimes difficult to really understand the patterns presented in many mathematical articles (particularly for those of us without formal maths training) for the general case of a particular theorem. Whether someone is willing to take the time to do that or not, I still greatly appreciate the efforts of contributors to Wikipedia since it's often my first stop at the base of a learning curve.

Chris —Preceding unsigned comment added by Chris Carleton (talkcontribs) 16:18, 18 November 2010 (UTC)

Request for addition

I would like to request addition of the content proposed here to the relevant article. As I don't have access to a source which provides information about the sample size issue, I hesitate to add it to the article myself. hujiTALK 00:07, 20 November 2010 (UTC)

ƒ or f

Bkell has recently raised a question whether we should use symbol ƒ or f to denote functions. He points out that

It seems to me that since there is no special symbol for a function named g or h, it doesn't make sense to use some special symbol for a function named f. The symbol used in mathematical papers isn't a Latin small letter F with hook—it's just an italic f.—Bkell

It seems to me, however, that the HTML symbol ƒ (ƒ) is specifically designed to denote the function of something, and thus is more appropriate. Besides, it better matches the default TeX rendering:  . // stpasha » 16:49, 18 November 2010 (UTC)

The symbol used in mathematical papers is just a plain old lower-case letter f, but it's rendered in an italic serif font, which traditionally extends the letter below the baseline. The Latin small letter F with hook looks similar in a sans-serif font, but it is a different character. Slides produced for mathematical talks often use sans-serif fonts, and it is clear in such slides that the letter is f, not ƒ. Yes, the HTML symbol is named &fnof;, and the Unicode character table for Latin Extended-B [1] describes the character ƒ as "LATIN SMALL LETTER F WITH HOOK = script f = Florin currency symbol (Netherlands) = function symbol". However, it just doesn't make sense to use a special character for functions named f, when functions named g, h, etc. just use the regular old character. Really what should be done for consistency is to write <math>f</math>, because then the character is properly rendered in an italic serif font. —Bkell (talk) 17:38, 18 November 2010 (UTC)

Using ƒ for functions is inconsistent and an abuse of notation. What is even worse is that for reasons which escape me, many people write the character unitalicised, ƒ, which is outright ugly, and it defeats the original purpose of having a fancier version of italic f.—Emil J. 18:06, 18 November 2010 (UTC)


I completely agree with Bkell and EmilJ. If we only used f or some variation thereof, there might be a case (if a weak one) for ƒ. But with many other symbols also used for functions, it makes no sense to use a special symbol for one and ordinary letters from the Latin alphabet for the others; it's not only illogical, but the typographical clash is jarring. And perhaps the way we pronounce it (“eff of”) is the strongest hint of all. Moreover, the instinctive search would be for f, and such a search would fail to find instances of ƒ. I also agree with Bkell that professionally published mathematical books are good sources for guidance, and they almost universally use an italic f, just as they use italic g, h, and so on.

I think the fascination with ƒ for some may simply be visual. Most mathematical texts, like most other texts, are set in a serif typeface, for which the oblique font is also cursive, and the italic f is nearly always a descender. For good or for ill, the default typeface for Wikipedia is sans-serif, and the oblique f is not a descender. If we resort to tricks to force a descender, the clash with the running typeface is obvious to anyone paying attention, resulting in hideous copy. We seem to have a similar issue with the symbol used to indicate aperture in photography. The vast majority of publications use either an italic or roman f, e.g. f/4 or f/4; a few publications, and a small number of web pages use ƒ/4, but they're vastly in the minority. There is a template, {{f/}}, that forces a descender f, e.g., f/4, but the clash of typefaces is again glaring. And no one has produced a single example outside of Wikipedia that takes this approach. Once again, most books on photography are set in serif type, leading to passages such as “the lens should be set to f/4”. But there's nothing special about the f; it's simply that in a serif typeface, the “italic” font is truly italic, and the f is a descender. If we require that the f be a descender, Wikipedia use change the default typeface to serif (or one of the few sans-serif faces, such as that used in {{f/}}).

stpasha has a point with the better match of ƒ to the default TeX rendering. But we have the same issue with any quantity symbol—it's an unavoidable consequence of using a serif typeface in TeX and a sans-serif face for running text. One approach is to use <math> ... </math> constructions for quantity symbols in the running text, but we then get a rather ugly mismatch between the quantity symbols and the rest of the running text. Current practice in ISO standards seems to use this approach, leading to, at least to my eye, some of the most hideous typography I've ever seen.

Bottom line? I think we should follow logic and follow long-established practice among those who publish books professionally. JeffConrad (talk) 19:09, 18 November 2010 (UTC)

I too dislike ƒ. Just because ƒ is there doesn't mean we have to use it. In addition to not matching the surrounding typeface, it may cause accessibility issues for users using screen readers or on old software, as the software may not know how to interpret the symbol. I think it is much more consistent and elegant to use f, just as we use g, x, y, and so on. Ozob (talk) 23:23, 18 November 2010 (UTC)
I strongly disagree with you guys. ƒ is clearly better than f. Almost all mathematicians these days write in LaTeX, and their math-mode f looks almost identical to ƒ. Take a look:   (compare that to ƒ, g, h) . Tradition dictates that the f's be more fancy than the g's and h's, which I think out-wieghs the need for consistent fonts. Ideally, the math-mode feature on Wikipedia would be more consistent with the rest of the article, so we could write all mathematical symbols on WIkipedia in math-mode, but that's not the case. As such, using the ƒ, g, and h convention is the best way to denote functions.--Dark Charles 19:58, 19 November 2010 (UTC)
Dark Charles, let me repeat that the "math-mode f" you refer to is just a plain old lower-case letter f in an italic serif font: for example, the italic serif font used here. Note how the f's extend below the baseline? That is how the letter f looks in most italic serif fonts. It isn't some special character. LaTeX does not use a special character for a "math-mode f".* Above I mentioned slides for mathematical talks. Often such slides are produced by a LaTeX package called Beamer, which by default uses a sans-serif font, and it is clear in these slides that functions called f are referred to with just a simple letter f; see [2] for example. Professionally typeset mathematics does not use a special character for functions called f. —Bkell (talk) 00:23, 20 November 2010 (UTC)
* This is a tiny lie. LaTeX actually does use italic letters from a special "math italic" font for math, which differs from the "text italic" font mainly in the kerning between characters and in the widths of some characters like the lower-case b. —Bkell (talk) 00:34, 20 November 2010 (UTC)
This conversation seems to have branched over to Wikipedia talk:WikiProject Mathematics#ƒ or f?, too. —Bkell (talk) 00:49, 20 November 2010 (UTC)
Okay, I checked my LaTeX and both the italics f and the math mode f are the same. However, as you said, generally math mode and italics aren't the same. And so, I think ƒ's should be used for f's as functions in that same spirit. What's more, the only example of a math textbook written in an abnormal font is Rudin's Principles of Mathematical Analysis (which is written in TImes New Roman) and Rudin uses ƒ not f.--Dark Charles 03:14, 20 November 2010 (UTC)
I repeat: That's because it's a serif font. In nearly all serif fonts, the ordinary italic f extends below the baseline. (Here is the ordinary Times New Roman italic ff.) You are not seeing a special form of the letter f used for math—you are seeing the ordinary italic f for the typeface used in the book. Find an italic f in Rudin's book in some ordinary text, and it will be exactly the same character. The difference between f (an italic sans-serif f) and f (an italic serif f) is because they are different typefaces, not different characters. Go find some math written in a sans-serif font, as I've previously suggested, and you'll see that the f does not extend below the baseline. —Bkell (talk) 04:51, 20 November 2010 (UTC)
I second Bkell's comment in spades. As I indicated above, we had essentially the same discussion about the symbol to use when indicating an aperture in photography, e.g., f/4. Examination of many works revealed that the f was simply in the “italic” font of the running typeface; because the vast majority of such works are printed in a serif typeface, the f is usually a hooked descender; in most of the few works I looked at set in sans-serif type, the f indeed matches the running face—it's just set in the “italic” (properly, oblique) font. The {{f/}} template attempts to get around this by forcing a series of sans-serif typefaces, beginning with Trebuchet. There's an obvious clash even with the default running face, e.g., “the lens was set to f/4”, but if the reader has set preferences to use a serif typeface, the clash is glaring, e.g, “the lens was set to f/4”. Forcing a typeface switch is a slightly different issue than using a special character, but in the end, the two are of the same ilk. Failure to separate form and content is usually a road to disaster (speaking from experience ...), and diddling typefaces or characters to achieve a specific appearance is but one example. It is a practice against which Wikipedia should resolutely set its face. JeffConrad (talk) 09:03, 20 November 2010 (UTC)
Okay, I guess I concede this one.--Dark Charles 10:08, 20 November 2010 (UTC)

I think we can conclude this discussion asserting that there was a consensus that f is preferred to ƒ.  // stpasha »  05:37, 23 November 2010 (UTC)

Update of MoS

In accordance with this discussion and the related discussion at WT:WPM, I have updated the math MoS to explicitly disallow use of ƒ for a function. Ozob (talk) 14:43, 28 November 2010 (UTC)

Good–Turing frequency estimation

"Instead we plot [...]"

Shouldn't there be a plot? Please add some illustrations or adjust the text. Thank you! --Peni (talk) 14:01, 1 December 2010 (UTC)

P.S. Source: Good-Turing smoothing without tears, William A. Gale Journal of Quantitative Linguistics, 1995. --Peni (talk) 15:39, 1 December 2010 (UTC)
Be bold. --Qwfp (talk) 17:27, 1 December 2010 (UTC)

I have copied the above to the article's Talk page, as there was no request there. Melcombe (talk) 14:25, 2 December 2010 (UTC)

Deletion proposal

For info, I have placed a PROD template on Enforced continuity. I think this is the last of that collection of articles that were on poor English and based on single-source ideas. Melcombe (talk) 13:13, 10 December 2010 (UTC)

Shlomo Sawilowsky and related articles

Hi, there is a long and ongoing dispute between Kiefer.Wolfowitz (talk · contribs) and Edstat (talk · contribs) mainly centered around the apparent single purpose nature of Edstat's edits which are mainly related to Shlomo Sawilowsky or his work. I'm hoping that someone here may be able to shed some light on whether Edstat's edits are adding undue weight to Sawilowsky's work, or if this is merited. For someone with very little knowledge of statistics, this is difficult to determine. There is ongoing discussion on Kiefer.Wolfowitz's talk page but I think it may be best if this could be discussed here rather than there, where it may be more difficult to stick purely to content matters. SmartSE (talk) 11:48, 20 November 2010 (UTC)

I noticed that Sawilowskiy's bio page at Wayne's University [3] is a redirect to Wikipedia. It seems to me as an indication that Shlomo Sawilowsky himself is in fact one of the key editors of the Shlomo Sawilowsky wikipage. As for the Sawilowsky's paradox article, its notability has not been established yet, even though it was half a year since the time of that article's conception. Even more, Edstat (talk · contribs) seems to be unable or (more likely) unwilling to explain clearly what Sawilowsky's paradox is about (and Abelson's paradox as well). // stpasha » 15:18, 20 November 2010 (UTC)
Hmm, that is interesting and something that I've never seen on any other academic's personal biographies. I've nominated the Sawilowsky's paradox for deletion, due to a lack of secondary sources to demonstrate importance. Abelson's does just appear notable based on hits like this but if people here disagree I'm happy to nominate it for deletion too. SmartSE (talk) 15:51, 20 November 2010 (UTC)
The talk page of the Anova article shows the first interaction between an IP address (12:50, 24 November 2010 (UTC))and me (Kiefer.Wolfowitz).. Kiefer.Wolfowitz (talk) 20:23, 20 November 2010 (UTC)
All this forensics is a charade. Lets see who mentioned the many invectives Kiefer.Wolfowitz has used against me (repeated use of bold, outrageous statements, etc. Lets see who can find any edit I've made that Smartse has supported and not attacked. Lets see who can find any edit I've made that Stpasha supported. Please, the charade is so transparent - can't you find some new name from the cabal for this latest attack?
Go look at Kiefer.Wolfowitz's defense of outrageous entries he owns - filled with photographs and paragraphs and information not on the subject, and see who I earned his ire with an attempt to keep the material relevant to the entry. Go and see how Kiefer.wolfowitz stalks many pages that I edit, to delete, revert, and contort. Which of you will uncover his explanation that if the author isn't in "right university" or "right publication" according to his standards it is trash? There is more to say, but what would be the point. Goodbye.Edstat (talk) 23:46, 20 November 2010 (UTC)
Look Edstat,
Please discuss statistical content here. Please continue to allege misbehavior against me on the talk page, where you left a "Shame on you" section, a few days ago (This has been removed, because Edstat has said goodbye on many pages, and seems not to be pursuing further complaints against me; Edstat's complaint can be viewed in the page history 12:50, 24 November 2010 (UTC)). (I am sorry that this blow-up has occured during Friday-Saturday.) Kiefer.Wolfowitz (talk) 00:01, 21 November 2010 (UTC)

Let me say that I believe there is no percentage in the personal side of this discussion. Such talk in the past has exacerbated disputes and and come too close to "attempted outing", for which sanctions are serious. Enough is already on this page for everyone to see why that is troublesome (understatement). What has been written is the correct issue for discussion in this forum. Charles Matthews (talk) 09:18, 24 November 2010 (UTC)

Since Edstat has said his goodbyes in several pages and seems no longer active, I feel that defending myself is not urgent and I shall remove some of the previous comments, which shall remain visible in the history of this page. Kiefer.Wolfowitz (talk) 12:07, 24 November 2010 (UTC)
Editors who say goodbye often come back. If the problem does resume, the WP:Conflict of interest/Noticeboard would be a good place to discuss this. The regulars at that noticeboard have knowledge of the OUTING rules and could advise how to stay on the safe side of them. Even in that venue, a COI complaint might not lead to a clear result. Edstat's personal attacks are easier to get a clear picture of and they do not make a good impression. If they continue, admins might decide to take action. EdJohnston (talk) 15:51, 24 November 2010 (UTC)
It was already mentioned at COIN, first here and then again which is how I heard about it. The problem is that a problem may already exist, mainly with the Sawilowsky bio but also in other articles, but I wanted to gain some input from statisticians before any wholesale clean up began. SmartSE (talk) 20:03, 24 November 2010 (UTC)
If the issue has been at COIN twice, then I think the way is clear for regular editors to consider cleanup of the Shlomo Sawilowsky article. It appears to me that the subject is notable as a statistician, and the task is to create a well-balanced article. If the cleanup is opposed by editors who appear to be the subject, then ANI would be the next step. It seems that Sawilowsky's paradox is on its way to being deleted at AfD. EdJohnston (talk) 23:22, 24 November 2010 (UTC)
Thanks all, especially Smartse for posting the links to COIN. I had not realized that there had been previous sock-puppetry COIN investigations, with comments by many (very) experienced editors/administrators. (I had thought that only one administrator had unilateraly imposed a short block on Edstat for sock-puppetry, only, although Spartse and Iulius alluded to something more serious.) It is a relief to know that more than a handful of us were concerned. I should have just ignored his latest attacks. Sincerely, Kiefer.Wolfowitz (talk) 08:48, 25 November 2010 (UTC)

Never can say goodbye

Editor Edstat has returned to editing the article on Shlomo Sawilowsky, exemplifying editor EdJohnston's foreboding comment (above): "Editors who say 'goodbye' often come back." Kiefer.Wolfowitz (talk) 12:53, 2 December 2010 (UTC)

Well, now I see why CM has come to that page to make changes, none of which were supported by WP policies - he used terms like "standarization" "sensible" "too much". I have since been told that he has to follow wikipedia rules for adding or deleting just as anyone else, and that being on the office staff of Wikimedia UK does not give him license to avoid the rules. EJ, as you have demanded, I have asked on that page for input from other editors or fixing the mistakes the CM's recent edits introduced. Moreover, K.F. I said goodbye to the pages where you were editing to avoid contentious debates from you. I did not say I would cease editing anywhere else, and in fact, I have continued to do so, both on that page, and on four or five other pages where I regularly edit.Edstat (talk) 21:48, 9 December 2010 (UTC)
Dear Edstat, I'm sorry for the attempted humor of the Michael Jackson song link, which was inappropriate. Let's relax and watch and learn from the editing of Charles Matthews and VernoWhitney, who have already resolved some disagreements amicably without questioning one another's good faith. Everybody should take it easy, and try to be on their best behavior. Sincerely, Kiefer.Wolfowitz (talk) 22:03, 9 December 2010 (UTC)

Apparent sock puppetry

I opened a discussion of apparent sock puppetry by Edstat, following an early investigation which did not list two additional (apparent) sock puppets (of which one remains active).Kiefer.Wolfowitz (talk) 15:02, 12 December 2010 (UTC)

F-test

The ANOVA F-test is now described as extremely sensitive to departures from normality, citing Shlomo Sawilowsky. Is this undue weight, against the recommendations of the textbooks (e.g. Moore McCabe)? Kiefer.Wolfowitz (talk) 22:05, 15 December 2010 (UTC)

I've commented on this at the F-test talk page. Skbkekas (talk) 04:55, 16 December 2010 (UTC)