User:Thepigdog/Probability

Probability theory starts with "random action" such as the toss of a coin, roll of the die, selection of a card from a deck of cards, or spin of a roulette wheel.

Language

In the language of probability the result of a random action is called an outcome. Sets of outcomes are called events.

Note: Events are created from the power set of outcomes. Dont be confused by this terminology. An event is just a set of outcomes. The mathematical ways of saying this is that events are elements of the power set of outcomes.

Randomness

These random actions are called random because we have no way of predicting what the result may be. So we dont know which outcome will occur. In the absence of better information we give every outcome an equal probabiliy.

The actions may be truly random in the sense that no one can ever predict the outcome, or random relative to our knowledge. In either case the effect is the same.

Probability

The probability of an event is the number of outcomes in the event divided by the total number of outcomes,

P(A)={\frac {Count(A)}{Count(All)}}

All is the set of all outcomes.
The probability of All is 1.
The probability of the empty set is 0.

Condition Probability

Probability is define in terms of all outcomes. But what if we consider the probability within the set of outcomes for another event. This is called conditional probability, written P(A|B).

P(A|B)={\frac {Count(A\cap B)}{Count(B)}}

Normal probability is just a special case of conditional probabiliy.

P(A)=P(A|All)

Wrong assumptions

Probability depends on the state of knowledge, not on the time sequence of events. Before a card is drawn each card is equally likely. Once we see the outcome, or just the event to which the outcome belongs we gain knowledge which changes the probability (the probilities become the conditional probalities to the event we know now).

You dont need multiple random actions to make a probability distribution. A single action is enough.

Combinatorial Probabilities

Sometimes we want to look at the probabilities for a series random actions. In this case combinatorial notation may be used. Combinatorial notation is a simple extension of arithmatic.

For example if I wish to go to the butcher I might write,

Purchase=6*LambChop+4*steak+12*sausages

If we replace LambChop, steak, sausages with the prices for these items we get a formula for the cost of the order. But it is not necessary to do so. The equation has a meaning even if we dont think LambChop as denoting the price of a lamb chop.

Take for example a weighted coin with probability h of Heads and probability t of Tails. In combinatorial notation we can write this

CoinToss=h*H+t*T\!

what then is the result of a n coin tosses. In combinatorial notation we write,

CoinToss^{n}=(h*H+t*T)^{n}\!

The result can be expanded using the binomial theorem,

(h*H+t*T)^{n}=\sum _{k=0}^{n}{n \choose k}(h*H)^{n-k}(t*T)^{k}.\!

where,

{n \choose k}={\frac {n!}{k!(n-k)!}}

then,

(h*H+t*T)^{n}=\sum _{k=0}^{n}{n \choose k}(h)^{n-k}(t)^{k}*(H)^{n-k}(T)^{k}.\!

Then the probability of each combination of events $(H)^{n-k}(T)^{k}$ may be read,

B(n,k)=P(H^{n-k}T^{k})={n \choose k}h^{n-k}t^{k}.\!

This is called the binomial distribution.

Normal Distribution

H and T are events. By associating small changes in a value x of,

H ... $-{\frac {1}{2n}}$
T ... ${\frac {1}{2n}}$

and taking the limit as n -> infinity you get the normal distribution.

f(x)={\tfrac {1}{\sqrt {2\pi \sigma ^{2}}}}\;e^{-{\frac {(x-\mu )^{2}}{2\sigma ^{2}}}}.

Measures

Some sets are so large that it is not possible to count them. In particular the real numbers are uncountable.

If the outcomes were real numbers,

You need an infinite amount of information to specify them.
The probability of any real number would be zero.

This might seem a problem but in fact clever mathematicians have worked out how to measure infinite sets and define probability using measures.

P(A|B)={\frac {Measure(A\cap B)}{Measure(B)}}

For example if I know that an outcome will be between 0 and 2 (written [0, 2]) and I want to know the probability of the outcome is between 0 and 1. The measure of [0, 2] is 2 (for the euclidian measure). The measure of [0..1] is 1.

P([0..1]|[[0..2])={\frac {Measure([0..1]\cap [0..2])}{Measure([0..2])}}={\frac {1}{2}}

It is usually not possible to consider the probability of a single real number outcome. Either probabilities of events consisting of measurable sets must be considered, or probability distributions must be considered.

Probability of statements

Probability is defined in terms of events, which are sets of outcomes. What then is the probability of a statement.

 P(John will select a royal card)

Here we can relate this to the royal card event (set of royal cards in the pack of cards).

 P(The sun will rise)

This is a much more abstract statement. To answer such a question you need to consider,

All the possible models that can be written (ordered by complexity).
All the possible worlds generate by the models.

These worlds are outcomes and "the sun will rise" is an event

Inductive Inference

Inductive inference is the way of finding the probability of future events based on past history.

Bayes' Theorem is a very standard simple easy to prove theorem used in inductive inference.
Information theory gives us a key insight into measuring the probability of out comes based on there complexity.

Problems with Probability

Probability is based on the absence of information. We dont have any reason to believe that a coin will come down as heads or tails. But maybe the coin is biased.

A more correct approach to probability is to consider every possible model for a systems behaviour. This is the approach taken in inductive inference.

But even there the theory tells us that we need prior assumptions about probabilities. You can standardise these assumptions, but the probabilities in the absence of a lot of data remain sensitive to assumptions.

Probability is relative to the knowledge you have, and to the assumptions you make. We would like to make it an absolute unbiased predictor of the future but in many cases this may not be possible.