Welcome to the statistics portal
Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It is applicable to a wide variety of academic disciplines, from the natural and social sciences to the humanities, government and business.
Statistical methods are used to summarize and describe a collection of data; this is called descriptive statistics. In addition, patterns in the data may be modeled in a way that accounts for randomness and uncertainty in the observations, and then used to draw inferences about the process or population being studied; this is called inferential statistics.
Statistics arose no later than the 18th century from the need of states to collect data on their people and economies, in order to administer them. The meaning broadened in the early 19th century to include the collection and analysis of data in general.
|Datasets with various correlation coefficients
A correlation, (often measured as a correlation coefficient), indicates the strength and direction of a linear relationship between two random variables. In general statistical usage, correlation or co-relation refers to the departure of two variables from independence. In this broad sense there are several coefficients, measuring the degree of correlation, adapted to the nature of data. A number of different coefficients are used for different situations. The best known is the Pearson product-moment correlation coefficient, which is obtained by dividing the covariance of the two variables by the product of their standard deviations. Despite its name, it was first introduced by Francis Galton.
William Sealy Gosset (1876–1937) is better known by his pen name Student and gave this name to Student's t-test and Student's t-distribution. He joined the Dublin brewery of Arthur Guinness & Son in 1899, where he applied his statistical knowledge both in the brewery and on the farm to the selection of the best yielding varieties of barley. Gosset's key 1908 papers addressed the brewer's concern with small samples. To prevent further disclosure of confidential information, Guinness prohibited its employees from publishing any papers regardless of the contained information, so Gosset used the pseudonym Student for his publications to avoid their detection by his employer.
Featured and good articles
These are featured or good articles on statistics topics.
- Featured articles
- Featured lists
- Good articles
Related projects and portals
A scatter plot is a type of mathematical diagram using Cartesian coordinates to display values for two variables for a set of data. The data is displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis. A scatter plot is also called a scatter chart, scatter diagram and scatter graph. This scatter plot shows the relationship between time between eruptions and the duration of the eruption for the Old Faithful Geyser in Yellowstone National Park, Wyoming, USA. This chart suggests there are generally two "types" of eruptions: short-wait-short-duration, and long-wait-long-duration.
- ...that according to Benford's law, the first digit from many real-life sources of data is 1 almost one third of the time?
- ...that the Law of Truly Large Numbers of Diaconis and Mosteller states that with a sample size large enough, any outrageous thing is likely to happen?
- ...that for the number of shuffles needed to randomize a deck, Persi Diaconis concluded that for good shuffling technique, the deck did not start to become random until five good riffle shuffles, and was truly random after seven, in the precise sense of variation distance described in Markov chain mixing time?
- ...that for many standard probability distributions, there are infinitely many outcomes in the sample space, so that attempting to define probabilities for all possible subsets of such spaces would cause difficulties for 'badly-behaved' sets such as those which are nonmeasurable?
- ... that Jan Piekałkiewicz, a leading Polish statistician, became the Polish Underground State's Government Delegate, and died at the hands of Nazi Germany?
- ... that Alec Gallup, co-chairman of The Gallup Organization and the son of founder George Gallup, was described as someone who could "smell out a bad question or an unreasonable interpretation of data"?
- ... that the convergence of the iterative proportional fitting method for estimating the cell values of a contingency table was re-proved using differential geometry?
- ... that statistical properties dictated by Benford's Law are used in auditing of financial accounts as one means of detecting fraud?
- ... that Henry Mann's 1949 book, Analysis and design of experiments, filled mathematical gaps in the statistical writings of Ronald A. Fisher?
Click an arrow symbol to expand any of the sub-categories: