Feel free to use this page to experiment with the Text Formatting Rules. Just click the "Edit Page" link at the bottom of the page. The edit password is sandbox
\frac{a}{b+c}
\[ \sqrt{n} \]
Chapter I—An Introduction to Probability
Section 1.1:Random Experiments, Sample Spaces and Events
In some cases, an experiment produces predictable results.That is, if we repeat the experiment under identical conditions, we get the same result.For example, if we mix two chemicals, say sodium hydroxide (NaOH) and hydrochloric acid (HCl), in precisely measured quantities, we always get sodium chloride (NaCl) and water (H2O), in predictable amounts.We call such experiments deterministic.
In other cases, the outcome of the experiment, even when repeated under identical conditions, is not predictable.For example, if we toss two dice, there is no way to know, a priori, what the outcome will be.(If you do know, you have a promising career in Las Vegas or Atlantic City.)We call such experiments random, or probabilistic or stochastic.
Although the outcome of a random experiment is not predictable, there is a set of potential outcomes, one of which occurs each time the experiment is performed.
Definition:A sample space for a random experiment is a set that contains all possible outcomes for that experiment.
We’ll denote the sample space by S.
A random experiment may have several different sample spaces, depending on how we classify or quantify the outcomes.The only requirement is that it contain all possible outcomes for that particular classification.
Example 1.1.1:Flip a coin.Then S={H, T} is a sample space since every outcome results in a head or a tail.
Example 1.1.2:Flip two coins.One sample space is
={HH, HT, TH, TT}.Another possibility is
={0, 1, 2}, where we are quantifying the outcomes by the number of heads that appear.Yet another sample space is S={same, different}, meaning that the two coins either have the same side up, or different sides up.
Example 1.1.3:Toss two dice.One sample space consists of all the possible numerical combinations that can appear; that is,
={(1, 1), (1, 2), (1, 3),…, (6, 6)}.There are 36 such outcomes, assuming we count (2, 3) as different from (3, 2).(Perhaps the dice are different colors.)Or we can use
= {2, 3, 4,…, 12}, where we have added the number of spots on the two dice.
Example 1.1.4:Select two cards, without replacement, from a standard deck of 52.Again, we can list all the possible pairs of cards.This would be quite tedious.(Later we’ll show that there are 1326 such outcomes.)A simpler sample space would be S={(black, black), (black, red), (red, black), (red, red)}.Whether this is adequate depends on the questions we want to answer about the experiment.
Example 1.1.5:Shoot a basketball until you successfully hit a shot.Then one sample space is S= {H, MH, MMH, MMMH,…}, where H means you hit a shot and M means you missed.
Unlike the previous examples, this sample space has infinitely many elements.However, the set is still countable, meaning that they can be put in one-to-one correspondence with the set of positive integers.In simpler terms, there is a definitive first element, second element, and so on, with no elements in between.That is not true in the next example.
Example 1.1.6:Stand at the entrance to a bank and measure the time between the arrivals of successive customers.Then a sample space is S={t> 0}, where t is the time, presumably measured to an infinite degree of accuracy.
Here, S contains an uncountable number of elements.In other words, given any two elements of S, say t = 2.3 and t = 3.7, there are infinitely many elements in between.This was not the case in Example 1.1.5.In later chapters, we shall see that finite and countably infinite sets are treated in a similar fashion; uncountably infinite sets require a different approach.
In any given random experiment, some outcomes may be more likely to occur than others.In Example 1.1.3, if we count the total number of spots on the two dice, then the most likely outcome is 7 since it can occur in the largest number of ways—{(1, 6),(2, 5), (3, 4), (4, 3), (5, 2),(6, 1)}.The least likely outcomes are 2 and 12, which can occur in only one way each.Or suppose we count the number of runs scored by a baseball team in one game.A sample space is S={0, 1, 2, 3,…}, but the individual outcomes in the sample space are surely not equally likely to occur.Most teams are more likely to score about 4 – 6 runs per game than a number on either side of that range.
A numerical quantity used to measure how likely an outcome is to occur is called the probability of that outcome.It is a number between 0 and 1, sometimes expressed as a percentage.The sum of the probabilities of all the outcomes in a sample space must equal 1.Symbolically, let
be a sample space. S may be finite or countably infinite.Let
be the probability that outcome
occurs.Then:
•0 ≤
≤ 1, for all i = 1, 2, 3,…
•
Note that if the sample space is countably infinite, the summation above is an infinite series.It is certainly possible for that series to converge to 1.For example, if
, then
.(This is a geometric series.)
However, we can encounter a problem if we only think about the probability of individual outcomes.If the sample space has an uncountably infinite number of elements, then all those elements must have probability 0.Otherwise, it is not possible for all the probabilities to “add up to” 1.
So, in Example 1.1.6, the probability that exactly 3.4 minutes (or any other specific time) elapses between arrivals is 0.
To remedy this situation, we will assign probabilities to sets of outcomes which are subsets of a sample space.
Definition:An event is a subset of a sample space.
We’ll denote events by capital letters, E, F, G, and so on.
An event may consist of a single outcome, or multiple outcomes.The null or empty event has no outcomes, and is denoted
.The entire sample space is also considered to be an event.
In Example 1.1.3, let Ebe the event that the first die shows a 3.Then E={(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6)} is a subset of
.Note that we could not use
as the sample space if we are interested in this event.In Example 1.1.5, let E be the event that it takes no more than 3 shots to make a basket.Then E={H, MH, MMH}.In Example 1.1.6, let E be the event that at least 2 minutes elapses between customer arrivals.Then E = {t ≥ 2} is a subset of S.
Events can be combined using standard set operations.
Definition:The union of two events, E and F, is the set of all outcomes that are either in E or F or both.It is denoted
.
The intersection of two events, E and F, is the set of all outcomes that are in both E and F.It is denoted
.
The complement of an event E is the set of all outcomes in S that are not in E.It is denoted
.
It is possible that
will be the empty set
.If so, we say that E and F are disjoint or mutually exclusive.For example, suppose that, as above, in Example 1.1.3, E is the event that the first die shows a 3 and F is the event that the sum of the dice is 10.Then F = {(4, 6), (5, 5), (6, 4)}, and
=
.
It is convenient to represent set operations by means of Venn diagrams.

It is also possible to extend set operations to more than 2 events.For example,
is the set of outcomes that are either in E or that are in F but not in G.It’s easier to draw the picture than to say it in words.

DeMorgan’s Law’s:
(i) The complement of the union of two events is the intersection of their complements; i.e.,
.
(ii) The complement of the intersection of two events is the union of their complements; i.e.,
.
These are easily illustrated by Venn Diagrams.They can be extended to more than two events.
Example 1.1.7:The following information was obtained from a survey of 100 high school seniors:
45 owned an iPod
62 owned a cell phone
44 owned a car
27 owned a cell phone and iPod
22 owned a cell phone and a car
23 owned a car and iPod
15 owned all three
How many owned only a cell phone?How many owned none of the items?
Let E be the set who owned an iPod, F be the set who owned a cell phone, G be the set who owned a car.We are given
,
,
,
,
,
and
.It is easiest to start with the intersection of all three.

So, the number of students who owned only a cell phone is
.The number who owned none of the items is
.Note that, by DeMorgan’s Law, this event can also be expressed as
.
Exercises 1.1
1.Write a sample space for each of the following random experiments:
(a)Flip a coin, then toss a die.
(b)Select two numbers, without replacement, from the set {1, 2, 3, 4, 5}.
(c)Flip a coin until you get two consecutive heads, or until a total of 4 flips has occurred.
(d)Measure the height of a randomly chosen 10-year old boy.
(e)Pick two marbles from a jar containing 3 red and 4 green marbles.
2.A computer selects two numbers, x and y, from the interval [0, 1].
(a)Sketch a sample space in the xy-plane.
(b)Sketch the event E =
.
(c)Sketch the event F =
.
3.A sample space consists of 5 elements
.Suppose the probability of outcome
is
, for some constant k.
(a)Determine k.
(b)Determine the probability of the event
.
4.You toss a dart at a circular dartboard whose radius is 10 inches.
(a) Write a sample space for this experiment in which you record the distance from the center of the dartboard to the location of the dart.
(b)Write a sample space in which you record the x- and y-coordinates of the location of the dart.Assume the center of the dartboard is the origin.
(c)Let E be the event that the dart lands in the bullseye, a concentric circle of radius 2 inches.Define E in terms of each of the sample spaces above.
5.You toss two dice.Let S ={(1, 1), (1, 2), (1, 3),…, (6, 6)}.Write the elements of S that are in each of the following events:
(a)E:the product of the two dice is 12
(b)F:the larger of the two dice is 3
6.Three men, Andy, Bob and Carl, toss their hats into a box.Each man, in turn, selects a hat at random from the box (without replacement).
(a)Write a sample space that indicates all possible orders in which the hats can be selected.
(b)For which outcomes does no man get his own hat?
7.Draw a Venn Diagram with events E, F and G.Sketch the events:
(a)
(b)
8.Use a Venn Diagram to verify DeMorgan’s Laws.
9.A sample space contains 60 elements.Event E contains 22 elements, F contains 28 elements.There are 20 elements in neither event.
(a) How many elements are in both E and F?
(b)How many are in E only?
10.A car dealership has 70 cars on its lot.Of these, 24 have antilock brakes (ABS), 39 have air conditioning (AC) and 32 have satellite radio (SR).15 have ABS and AC, 17 have AC and SR, 13 have ABS and SR.7 have all 3.
(a)How many have none of the features?
(b)How many have only ABS?
(c)How many have ABS but not SR?
Section 1.2: Axioms of Probability
As we said in the last section, we will talk about the probability of events, rather than individual outcomes.We are now ready to state a set of axioms that these probabilities must satisfy.
Let S be a sample space for a random experiment.To each event E in S, we assign a probability, denoted
such that:
(i)0 ≤
≤ 1, for all E.
(ii)
=1.
(iii)If E and F are disjoint events, then
.
These axioms are applicable to any sample space, whether finite, countable or uncountable.If S is finite or countable, then each individual outcome is an event and we can assign probabilities to each outcome, as described earlier.
We can use these axioms to prove other results:
Theorem 1.1:
.
Proof:Since
, then by (iii),
.However,
, so by (ii),
.Therefore,
.
It follows from Theorem 1.1 and axiom (ii) that
.The converse of this is false.There are events, other than
, that have probability 0.For example, if we pick a number at random between 0 and 1, the probability of picking any specific number, say .3, is 0.Yet the event
is not empty.
Theorem 1.2:
.
Proof:Observe that
=
, where
and
are disjoint events.Therefore,
.Likewise,
, again the union of disjoint events.Hence,
, or
.Substitution in the previous equation gives the desired result.
In essence, this says that if we just add
and
, we will have counted the intersection twice.So we subtract it once to compensate.
Example 1.2.1:A card is selected from a standard deck.What is the probability that the card is a jack or a diamond?
Let E be the event that the card is a jack and F be the event that the card is a diamond.Assuming all cards are equally likely to be chosen, then
and
.
has one element—the jack of diamonds—so
.Hence,
.Example 1.2.2:Let E and F be events such that
and
.Is it possible for E and F to be disjoint?Explain.
If they were disjoint, then
, which is impossible.Moreover,
must be at least .1.
Exercises 1.2
1.Two dice are tossed.What is the probability that the first die is a 3 or the second die is a 5?
2.Let E and F be disjoint events such that
and
.Determine
.
3.Let E and F be events such that
= .4,
= .3 and
= .2.Determine
.
4.You select 3 marbles from a jar containing 4 red and 6 blue marbles.Let E be the event that all the marbles selected are red.What is
?
5.Let
be the set of elements that are in E or F but not both.Determine a formula for
in terms of
,
and
.
6.Prove that
.
7.Prove that if
, then
.
8.Extend Theorem 1.2 to the union of three sets.That is, express
in terms of the probabilities of the individual sets and their intersections.
Section 1.3:Conditional Probability and Independence
A box contains 3 red and 2 blue marbles.You select a marble at random, replace it in the box, and then select another.Let E be the event that the first marble is red and F be the event that the second marble is red.Then
is the event that both marbles are red.We want to determine
.
One possible sample space for this experiment is
.However, since there are unequal numbers of red and blue marbles, then these elements are not equally likely to occur.So, given what we know now, we cannot assign probabilities to each outcome in this sample space.
Now let
, where we have distinguished between each of the marbles.Of these 25 elements, which are equally likely to occur, 9 have both marbles red.Hence,
.
Notice that
and
so, in this case,
.The fact that the probability of the intersection of two events is equal to the product of the probabilities of the events is NOT ALWAYS TRUE.When it is true, we say that the events are independent.
Definition:E and F are independent if and only if
.
In practical terms, two events are independent if the outcome of the first event does not influence the probability of the second event.It is clear this is true in the marble example since we replaced the first marble before selecting the second.Had we not done that, then the probability that the second marble is red depends on whether the first one is red or blue.If the first is red, then
; if the first is blue, then
.
In many cases, our intuition tells us that two events are independent.Then we can use that fact to compute the probability that both events occur.
Example 1.3.1:Let E be the event that the New England Patriots win the Super Bowl in 2007 and F be the event that the St. Louis Cardinals win the World Series in 2007.It seems plausible that these events should be independent, so the probability that both occur is
.
Now let G be the event that the Patriots win the Super Bowl in 2008.E and G are probably not independent since, if the Patriots win in 2007, they are probably somewhat more likely to win in 2008 than if they hadn’t won in 2007.
Example 1.3.2:Let E and F be independent events such that
,
.Determine
.
Since E and F are independent, then
= .15.Then by Theorem 1.2,
.Note that, had we not been told that E and F are independent, we would not have enough information to do this problem.
Critical observation:Do not confuse independent events with disjoint (or mutually exclusive) events.They are completely different concepts!If E and F are disjoint, then they cannot both occur.That means
.If, in addition, they were independent, then
.This would imply that either
= 0 or
= 0.Let’s summarize the distinction with two rules:
Addition rule:If E and F are disjoint, then
.
Multiplication rule:If E and F are independent, then
.
To illustrate this further, suppose that, in Example 1.3.1, we let H be the event that the Philadelphia Eagles win the Super Bowl in 2007.Then E and H are disjoint since there is only one winner of the Super Bowl in a given year.Furthermore, since, presumably, there is some chance that either team could win (i.e.,
and
), then E and H must be dependent.
Example 1.3.3:Suppose the probability that Ben is late for school on any given day is 0.2, independent of what happens on any other day.What is the probability that he is late at least once in a five-day week?
Let
, i = 1, 2, 3, 4, 5, be the event that Ben is late on day i.We are told that the
’s are independent.However, they are not disjoint, since it is possible for him to be late more than once during the week.
We seek
, where
is the event that he is late on Monday or Tuesday or…or Friday, or any combination thereof.It is possible to extend Theorem 1.2 to more than 2 events, but it is complicated.Instead, let’s look at the complement.By DeMorgan’s Law,
.Since the
’s are independent, the
’s are independent.Thus,
.So,
.
Example 1.3.4:A waitress at a restaurant estimates that 30% of the customers who order coffee get decaffeinated and 70% order regular coffee.Assuming that the customers behave independently, what is the probability that two out of the next three customers will order regular coffee?
Let E represent the desired event.The possible ways in which E could occur are RRD, RDR, DRR.These are disjoint possibilities.Since the customers are independent,
.The outcomes RDR and DRR also have probability
.Therefore,
.We shall see many examples like this later on.
When we create a probability model, one of the most important questions that often arises is whether two events are independent.For example, suppose we treat two cancer patients with a new drug.Is the event that A is cured independent of the event that B is cured?It depends.There might be some external factor that influences the drug’s effect on both of them.In that case, the events are dependent.
The next question is what happens if events are not independent.Is there a result similar to Theorem 1.2 that allows us to calculate
in this case?To answer this, we need to talk about conditional probability.
Definition:Let E and F be events such that
≠ 0.The conditional probability of E given F is
.
If F has occurred, then, in essence, the sample space is reduced to F.Then event E is reduced to the part of E that is in F, namely
.Hence,
is the probability of E once all outcomes outside of F have been eliminated.
For example, suppose we survey 20 people, 12 women and 8 men.Three of the women and 4 of the men have gray hair.Let E be the event that a person chosen at random has gray hair; let F be the event that the person chosen is a women.Then
,
,
since there are 3 women with gray hair.Hence, the probability that the person chosen has gray hair given that it is a woman is
.The probability that the person chosen is a woman given that they have gray hair is
.
In this example, the knowledge that F occurred decreased
from
to
.It may be that the occurrence of F increases
.Or, it may be that
.If that is the case, E and F are independent. (Why?)
In most instances, we do not use the definition above to compute the conditional probability.Rather, we determine the conditional probability from the statement of the problem, then use it to compute
.
Example 1.3.5:At the beginning of this section we considered a box containing 3 red and 2 blue marbles.Now let’s select two marbles without replacement.Let F be the event that the first marble is red and E be the event that the second marble is red.Then
and
, from which the probability that both marbles are red is
.
Example 1.3.6:Against left-handed pitchers, a certain baseball player gets a hit 35% of the time (i.e. a .350 batting average).Against right-handed pitchers, he gets a hit 25% of the time.If 80% of his at-bats are against right-handed pitchers, what is the probability that he gets a hit?
Let
be the event that he hits against a right-handed pitcher and
be the event that he hits against a left-handed pitcher.Clearly
and
are disjoint. Let E be the event that he gets a hit. We are given
,
,
which implies
.We are interested in
.
From the Venn Diagram, we see that
and that
.

Therefore,
=.27.
This example illustrates a more general principle.Let
be pairwise disjoint events whose union is the entire sample space.That is,
and
, whenever i ≠ j.(This is a stronger statement than saying the events are mutually disjoint; that would mean that there are no outcomes in common to all the events.This says that every pair of events has no outcomes in common.)We say that
partition the sample space.

The partition chops E into disjoint pieces of the form
, i = 1, 2,…, n.Therefore,
, from which we have:
Theorem 1.3:
.
This is called the Theorem of Total Probability.
Example 1.3.7:A recording artist estimates that 60% of her fans are under age 20, 30% are between 20 and 40, and 10% are over 40.Of those under 20, 50% have bought her latest album, 40% of those between 20 and 40 have bought the album and 60% of those over 40 have bought the album.What is the probability that a fan chosen at random has bought the album?
Let
be events representing the three different age groups, and let E be the event that a fan bought the album.Then:

=
Often we are also interested in the probability that a particular
occurred given that E occurred.This is easily obtained from Theorem 1.3.We state it as a corollary, which is usually referred to as Bayes’ Rule or Bayes’ Theorem.
Corollary:
.
In other words,
is the fraction of the total probability of E that comes from the branch associated with
.
In Example 1.3.7, the probability that a fan who bought the album is under 20 is
.So, the knowledge that the fan bought the album slightly increases the probability that the chosen fan is under 20.
Example 1.3.8:A certain disease is known to affect 10% of the population.A test for the disease turns up positive in 99% of those who have the disease and in 5% of those who don’t have the disease.(These are called false positives.)Given that a person tests positive for the disease, what is the probability that she actually has the disease?
Let
be the event that she has the disease and
be the event that she doesn’t; let E be the event that she tests positive.We are given
,
,
, and
.Using Theorem 1.3, we get
.Then from the corollary, we get
.So there is only a 69% chance that she has the disease, even though she tested positive for it.
Depending on the prevalence of the disease and the false positive rate, the probability of having the disease given a positive test can be much smaller than 69%.For instance, if the disease affected on 2% of the population (all other things unchanged), we would get
.
This emphasizes the importance of retesting before confirming any diagnosis.(This has particular ramifications in drug testing, where the consequences of a positive test can be devastating.)
Application to reliability
Many systems consist of two or more interconnected components.Components may be connected in series, meaning that both have to work in order for the system to work, or in parallel, meaning that only one needs to work for the system to work.
Suppose a system consists of two components in series.The components behave independently and the probability that each one works is p.Let
be the event that the first component works and
be the event that the second component works. Then the probability that the system works is
.
If the components are connected in parallel, then the probability that the system works is
.Note that we could have gotten this answer by looking at the complement:
.
This generalizes to n components.If they are connected in series, the probability that the system works is
.This is a decreasing function of n.If they are in parallel, the probability that the system works is
.This is an increasing function of n.We often say that a system with several components in parallel has built-in redundancy.
Example 1.3.9:A system has three components as shown.Components 1 and 2 are in series; component 3 is in parallel with the combination of 1 and 2. If each works with probability p, what is the probability that the system works?

Let
be the event that component i works.Then the probability that the system works is
.
If we remove component 3, then the probability that the system works is just
.It is easy to see that
for 0 < p < 1.Hence,
>
.So the addition of component 3 in parallel with the others improves the reliability of the system.
Exercises 1.3
1.Let E and F be events such that
,
and
.Determine:
(a)
(b)
(c)
.
2.A family has four children.Assuming boys and girls are equally likely, what is the probability that the first child is a boy given that there are two girls and two boys?
3.The spinner below is divided into of 3 equal sectors.

You spin it twice.Assuming the spins are independent, what is the probability that the sum of the two spins is 5?
4.A box contains 2 red and 3 green marbles.You select 2 of the marbles without replacement.What is the probability that both are red given that they are both the same color?
5.Prove that if E and F are independent, then
and
are independent.
6.(a)TRUE/FALSE:
.
(b)TRUE/FALSE:
.
7.If
and
, what is the smallest possible value of
?
8.You pick two numbers at random, with replacement, from the set {1, 2, 3, 4, 5}.Let E be the event that the first number chosen is a 3; let F be the event that the first number chosen is 4; let G be the event that the second number chosen is 2.
(a)Which of the events are independent?
(b)Which are disjoint?
(c)What is the probability that the sum of the two numbers selected is 6?
9.You have 100 cards, numbered 00, 01, …, 99.You select one of the cards at random.What is the probability that the sum of the digits on the selected card is 10 given that the product of the digits is 0?
10.An insurance company rents 35% of its cars from Agency I and 65% from Agency II.
(a)If 8% of the cars from Agency I and 5% of the cars from Agency II break down during the rental period, what is the probability that a car rented by this company breaks down?
(b)Given that a car breaks down, what is the probability that is was rented from Agency I?
11.A bag contains n coins.One of the coins has two heads, the others are normal and fair.You select a coin at random and flip it three times.You observe three heads.What is the probability that the coin was the two-headed coin?How does the knowledge that three heads occurred affect the probability that the chosen coin had two heads?
12.Four equally reliable components can be arranged in one of two configurations, as shown below.Which one has a greater probability of working?

Configuration AConfiguration B
Section 1.4:Probability Models
So far we have talked about assigning probabilities to events (or, in some cases, individual outcomes) in a sample space.We listed some axioms that those probabilities must satisfy.Now we’ll address the issue of how we go about assigning the probabilities.
There are two approaches we can take—empirical and theoretical.An empirical approach to assigning probabilities entails repeating the experiment many times and using the observed outcomes to compute (or estimate) the probability of a given event.Specifically, suppose we perform the experiment N times.Let
be the number of times event E occurs.Then
is the fraction of times E occurs, also called the relative frequency of E.It serves as an approximation to the probability of E.In fact, we can define
.
For example, suppose we want the probability that a person selected at random will vote for Candidate A in the upcoming election.If we ask 1000 people and 614 say they will vote for him, then we can say that the candidate will get approximately 61.4% of the votes.
Obviously, this answer is only as good as the data collected.If we ask 1000 senior citizens, or 1000 women, or 1000 African-Americans, the results might not reflect the opinion of the entire population.That’s why the political pollsters make sure they take surveys that accurately reflect the population as a whole.(This is sometimes called stratified sampling.The pollsters know what fraction of the population is women, African-American, over 65, etc.So they collect data from samples that reflect those percentages.)
The empirical approach has other pitfalls.In order for the results to be useful, the experiment must be repeated under identical conditions.This may be impossible to do.For example, suppose we want to know if smoking two packs of cigarettes a day increases the risk of heart disease.We can observe such smokers over the course of 20 or 30 years and see how many get heart disease.However, it is not clear that these are identical trials.Some people may be more succeptible to heart disease for other reasons and some people may die of other causes before they would have gotten heart disease.
Also, performing an experiment many times may be impractical or expensive.Determining the probability that a new car will withstand a 20 mph crash into a brick wall entails smashing up a lot of cars.That’s not something the manufacturer would want to do 1000 times.
Finally, even under the best circumstances, it is not clear how many trials are necessary to get relative frequencies that are close to the true probabilities.
In a theoretical approach, we create a model or set of assumptions about how the system behaves.We can then use those assumptions to compute probabilities of any event. There is no way of knowing, a priori, if the assumptions are correct.The answers we get are only as good as the asssumptions.
For example, suppose we flip three coins.We can assume the coins are fair and that the flips are independent.Let S={HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} be the sample space.Under the assumptions, all outcomes in S are equally likely to occur.So, for instance, the probability of getting 2 heads is
.If the coins are not fair, that answer is wrong.
As another example, suppose we measure the time between arrivals at a bank, as in Example 1.1.6.Assume that the probability that more than t minutes elapses is given by the function
.It is easy to show that this satisfies all the axioms of probability.So, for instance, the probability that more than 3 minutes elapses is
= .223.Is this correct?We don’t know.
Once we create a model, we can collect data to verify whether the model is correct.Or more accurately, the data will tell us whether there is sufficient evidence to reject the model.This is where statistics comes into the picture.In essence, we are trying to draw conclusions about a population, which is the entire set of individuals or objects in which we are interested.For instance, in the voting example described above, the population is the entire set of voters.In Example 1.1.6, the population consists of the times between all arrivals at the bank.
Definition:Statistics is the art of collecting data and drawing conclusions about a population based on information in the data.
There are many statistical procedures that are used to draw conclusions.One of them is called hypothesis testing, in which we compare two statements (or hypotheses) and determine which one the evidence (data) more strongly supports.We’ll investigate this in more detail throughout the course.
In the case of model verification, we can use a procedure called goodness-of-fit, which is a type of hypothesis test.One of the statements is:“The model is correct”; this is the null hypothesis.The other is: “The model is not correct”; this is the alternative hypothesis.
Suppose we want to determine if a coin is fair.The null hypothesis is that the coin is fair; the alternative is that it is not fair.Now we collect data by flipping the coin, say, 100 times and counting the number of heads.If we got exactly 50 heads, an extremely unlikely event, then we would conclude the coin is fair.If we got 73 heads, then we probably would say the coin is not fair.But what if we got 54 or 48 heads?More generally, for what values of x will observing x heads in 100 flips lead us to conclude that the coin is not fair?
To illustrate the procedure in more detail, suppose we want to determine if a die is fair.Let’s toss it 150 times.If the die is fair, we would expect to see 25 of each number.The observed values (data) are:
| Value | 1 | 2 | 3 | 4 | 5 | 6 |
| Observed number | 21 | 17 | 30 | 26 | 23 | 33 |
| Expected number | 25 | 25 | 25 | 25 | 25 | 25 |
We need a way to measure how “far” the observed values are from the expected values.We shall show later that the best way to do this is to square the differences, divide by the expected number and add up the results for all outcomes; that is, let
.The bigger the value of W, the more the observed values are from what the model predicts, meaning that it is more likely the model is incorrect.
For this data,
.Later we’ll learn how to compute the probability that we could see a value of Wat least this large if the model were correct.It turns out that that probability is approximately .22.So there is a 22% chance we could see observations at least as extreme (i.e. far from the expected values) as those we saw with a fair die.It’s a matter of opinion as to whether this is small enough to conclude the die is not fair.
Section 1.5:A simple model
Suppose a random experiment has a sample space with a finite number of outcomes
.The simplest possible model assumes that all the outcomes in S are equally likely to occur.We call this the equiprobable model.If so, then the probability of each outcome is
for all i.In addition, if E is an event with k of the n outcomes in it, then
.So, computing probabilities is reduced to the problem of counting outcomes.
Example 1.5.1:Toss two dice.What is the probability that the sum of the dice is 4?
Let S ={(1, 1), (1, 2), (1, 3),…, (6, 6)}.If the dice are fair, then S consists of 36 equally likely outcomes.E=
, so
.Note that if we don’t include each outcome and its reversal, such as (1, 3) and (3, 1), then the sample space will not be equiprobable.
Example 1.5.2:Allison, Ben and Charlotte line up in random order.What is the probability that Allison and Charlotte are next to each other?
Let
be the sample space consisting of all possible orders.In four of these, Allison and Charlotte are next to each other.Hence,
.
In principle, this is a very simple concept.However, difficulty arises when the sample space (and, maybe the event of interest) have a large number of elements.In Example 1.5.2, let’s put two more people, Daniel and Ellen, in the mix.We can show that S now contains 120 elements, 48 of which have Allison and Charlotte next to each other.So,
.
Or, suppose we select 5 cards from a deck of 52.It turns out that there are 2,598,960 ways in which this can be done.Let E be the event that the 5 cards are the ten, jack, queen, king and ace in the same suit (what poker players call a “royal flush”).There are 4 such outcomes, one for each suit.Hence,
.
This suggests that we need efficient methods of counting large numbers.This comes under the heading of combinatorics, a mathematical subject that we could spend months on.We shall just touch the surface.We begin with a basic principle.
Multiplication principle:If a certain task can be performed in m ways, and another task can be performed in n ways, then the number of ways in which both tasks can be performed in succession is mn.
Example 1.5.3:Kevin owns 10 rap CD’s and 12 heavy metal CD’s.In how many ways can he select one of each type?
There are 10 ways in which he can select the rap CD and 12 ways in which he can select the heavy metal CD.Hence, there are 120 ways in which he can select one of each.
Example 1.5.4:Nine horses are entered in a race.In how many ways can a bettor select the first two finishers in the correct order?
There are 9 choices for the first place horse and 8 choices for the second place horse.Thus, there are 72 ways of selecting the first two.
The principle can be extended to more than two tasks.So if we wanted to pick the first three horses in order, there are 7 choices for the third horse, making a total of 504 ways of picking the first three.If we wanted to pick all 9 horses in order, there would be
ways of doing so.
In general, we can say:
Fact:The number of different ways of arranging n distinct objects in order is n! =
.
Note that it is important that we pick the objects in order.This means that picking A to win and B to finish second is different from picking B to win and A to finish second.If we didn’t care about the order—that is, AB is the same as BA—then there would be half as many ways, 36, of picking the first two finishers.For three horses, there are six equivalent selections—ABC, ACB, BAC, BCA, CAB and CBA—so the total number of ways of selecting the first three finishers, without regard to order is
.
Let’s generalize.Suppose we have n objects and we want to select k of them, without regard to order.An unordered selection is called a combination, so we are counting the number of combinations of n objects, taken k at a time.Equivalently, we are counting the number of subsets of size k that can be formed from a set of n elements.
Theorem 1.4:The number of combinations of n objects, taken k at a time is given by
.
The symbol
is read “n choose k.”Other notations are
or
.(This is the one used on many calculators.Look at the MATH à PRB menu on the TI-83.)These numbers are also called binomial coefficients since they appear in the binomial theorem.
Example 1.5.5:Referring to Example 1.5.3, suppose Kevin wants to select 3 of his rap CD’s.The number of ways in which he can do this is
.
Example 1.5.6:In how many ways can 5 cards be dealt from a deck of 52?That is, how many different poker hands are there?
Since the order of the cards doesn’t matter, there are
such hands, as we said before.
To prove Theorem 1.4, we’ll just generalize previous calculations.There are n choices for the first object, n – 1 choices for the second,…, n – k + 1 choices for the kth object.Thus, there are
ways of picking the k objects in order.The ordered selections (or permuations) can be divided into groups according to which objects they contain.Each group contains all the arrangements of the set of k objects.Since there are k! such arrangements, the total number of groups (or combinations) is
.Simply multiplying the numerator and denominator by
gives
.
Observations:
•
for all n.(Recall that 0! = 1.)
can be thought of as the number of ways of picking the null set.
•
.In other words, each subset of k objects corresponds to a subset ofn – k objects not chosen.
•For given n, the largest value of
occurs when
, if n is even.If n is odd, the largest value occurs at both
and
.
Theorem 1.5:
.
For example,
.
Theorem 1.5 can be proved easily from Theorem 1.4. (Try it!)However, here is another proof that is typical of the kind of proofs used in combinatorics.Let
be a set with n elements, one of which we label x.S has a total ofhttp://edisk.fandm.edu/john.carter/216images/image771.png subsets of size k.Of these,
do not contain x (select k of the remaining n – 1 elements), and
do contain x (pick x and then select k – 1of the remaining n – 1 elements).Since each subset either contains x or doesn’t contain x, we have the result.
This theorem allows us to store the binomial coefficients in a convenient array called Pascal’s triangle.
1
11
121
1331
14641
15101051
etc.
The entry in row n column k (where we start counting from 0) is
.So, for example, the underlined entry is
= 10. Each entry is the sum of the entry directly above it, and the entry above and to the left.(e.g. 10 = 6 + 4).
Example 1.5.7:Flip 8 coins.In how many different ways can you get 3 heads?
Think of this as having 8 slots, each of which is filled with H or T.

We select 3 of the slots for H; the remaining 5 get T.The number of ways in which this can be done is
.
If the coins are fair, then the sample space consists of
equally likely outcomes.Thus, the probability of getting 3 heads on 8 coin flips is
.
Example 1.5.8:A class has 12 boys and 9 girls.The teacher selects 4 students at random.What is the probability that she picks 2 boys and 2 girls?
There are
equally likely ways of selecting the 4 students.Of these, there are
ways of selecting 2 boys and 2 girls.Hence, the desired probability is
.
Theorem 1.6:
.
represents the total number of subsets of any size of a set with n elements, including the null set and the set itself. There are two choices for each element: it is either in the subset or not.Then the multiplication principle tells us there are a total of
subsets.In Pascal’s triangle,
is the sum of the elements in the nthrow.So, for example, in the fourth row, 1 + 4 + 6 + 4 + 1 =16=
.Earlier, we mentioned that the binomial coefficients appear in the binomial theorem.Let’s state that theorem now.
Theorem 1.7 (The binomial theorem):
where n is a positive integer.
So, for example,
.
Note that, by substituting x = y = 1 in Theorem 1.7, we get Theorem 1.6.
A special case of Theorem 1.7 occurs with x = 1:
.This can be extended to the case in which n is not a positive integer provided we define the binomial coefficients as
.Also, the limits of the summation are from k = 0 to ∞, not n.
For example, let n = –1. Using the formula above, we have:
.Hence:

This is recognizable as the sum of a geometric series with first term a = 1 and common ratio r = –y.
With n = –2, we have
, so:
.This can also be obtained by taking the derivative of the series for
since
.
The same approach can be used for fractional values of n, although it isn’t as easy to express the binomial coefficients in closed form.Note also that the series obtained by the binomial theorem are the same as the Taylor series that you learned in calculus.It can be shown that these series converge on the interval–1 < y < 1, and possibly one or both endpoints.
Applications
1.The Birthday Problem: Suppose a room contains n people.What is the probability that at least 2 of them have the same birthday?How large does n have to be in order for the probability that at least 2 have the same birthday to be greater than .5?
It is easier to compute the probability that all of them have different birthdays.Assume that birthdays are equally likely to fall on any of the 365 days in a year.(We’ll ignore leap years.)Then the probability that 2 people have different birthdays is
.The probability that a third person has a birthday different from the first two is
; hence, the probability that all three have different birthdays is
.Continuing in this fashion, the probability that 4 people all have different birthdays is
.The probability that n people have different birthdays is
.
This is a decreasing function of n.When n = 23,
.So, in a room of 23 people, there is a better than 50% chance that at least two have the same birthday.It is somewhat counterintuitive that so few people are required to make the probability more than 50%.
2.The Pennsylvania Lottery Games:The state of Pennsylvania (and many other states) run one or more lottery games to raise money for various causes.In PA, some of the money goes to help senior citizens pay for prescription medicines.
In the Daily Number, you select a three-digit number, where each digit can be 0, 1, 2,…, 9.The state selects a three-digit number each day at a certain time.If your number matches the state’s number, you win some amount of money.There are a total of 1000 possible numbers (from 000 to 999), so the probability of winning is
.There are variations of the theme, such has not having to get the numbers in the right order, that have higher odds, and lower payoffs.
The Big Four game is similar to the Daily Number, except that you pick a four-digit number.Thus, the probability of winning is
.Again, there are variations.
Example 1.5.9:In the Big Four game, what is the probability that the state picks a number with two pairs; e.g. 3737 or 4664 or 5599?
There are
ways of selecting the two digits that will appear.Then there are
of arranging the two pairs.For example, if we are going to have two 3’s and two 7’s, we can arrange them as 3377, 3737, 3773, 7337, 7373, 7733.So, there are
four-digit numbers with two pairs, implying that the probability of getting such an outcome is
= .027.
In the Cash 5 game, you pick 5 numbers from 1 to 39.You can’t pick the same number more than once, and the order in which they are selected doesn’t matter.The state then picks 5 numbers.If you match all 5, you win the biggest prize (which can be millions of dollars, depending on how many tickets were sold, and on how many winners there are).There are smaller payoffs for matching fewer than 5 numbers.In a similar game, Match 6 Lotto, you pick 6 numbers from 1 to 49.
There are
ways of picking 5 numbers out of 39.Since there is only one way to match all 5 numbers chosen by the state, the probability of matching all 5 is
.To match 4 of the 5 numbers, you have to pick 4 of the 5 correct numbers and 1 of the 34 incorrect numbers.There are
ways of doing this, so the probability of matching 4 numbers is
.Similarly, the probability of matching 3 numbers is
.
Example 1.5.10:In the Match 6 Lotto game, what is the probability that 3 of the 49 numbers selected will be even and 3 will be odd?
There are 25 odd numbers and 24 even numbers.So the number of ways of selecting 3 odd and 3 even is
.There are
ways of selecting the 6 numbers out of 49.So the probability of getting 3 odd and 3 even is
.
Exercises 1.5
1.A signal is created by choosing 3 out of 7 different colored flags and placing them, in order, on a flag pole.How many different signals are possible?
2.An ice cream store offers 15 flavors.Fred orders a banana split with two different flavors.How many possibilities are there?
3.(a)In how many ways can 3 men and 4 women line up?
(b)What is the probability that a random arrangement of 3 men and 4 women has all the women together and all the men together?
4.A cookie jar has 5 oatmeal and 8 chocolate chip cookies.Olivia selects 3 cookies without replacement (since she eats them as she goes).What is the probability that she picked exactly 2 chocolate chip cookies?
5.(a)How many distinct arrangements of the letters in the word QUESTION are there?
(b) How many distinct arrangements of the letters in the word ANSWERS are there? (Be careful—there are 2 S’s.)
(c) How many distinct arrangements of the letters in the word RESPONSES are there?
6.A multiple-choice exam contains 10 questions with 4 choices each.
(a) In how many different ways can the questions be answered?
(b)In how many ways can they be answered if it is known that exactly four of the answers are B?
7.In how many ways can a class of 18 students be divided into 3 groups, one with 5 people, one with 6 and one with 7?
8.In the Daily Number (3-digit) game, what is the probability that all three digits are different?What is the probability that two of the three digits are the same?
9.(a)In the Big 4 game, what is the probability of obtaining “one pair”—that is, two digits the same and the other two different; e.g. 3836 or 9117?
(b)What is the probability of getting “three of a kind”; e.g. 4944 or 3331?
10.In the Cash 5 lottery game, what is the probability that the 5 numbers selected will be:
(a) consecutive (e.g. 11, 12, 13, 14, 15)?
(b)all less than or equal to 20?
11.What is the probability that a five-card poker hand will contain four of a kind (e.g. all four 8’s and something else)?
12.Determine a value of n such that
.
13.Use the binomial theorem to prove
.
14.Use the binomial theorem
to derive an expression for
.[Hint:Take the derivative.]
15.The expression
represents the number of committees of any size that can be selected from a group of n people in which one member of the committee is then appointed chair.For example, with n = 3, we would have 12 such committees—{A, B, C, Ab, aB, Ac, aC, Bc, bC, Abc, aBc, abC}, where capital letters indicate the chair.One way to count this is to select the chair first, then fill up the committee with any subset of the n – 1 remaining people (including the possibility of no additional members).Determine the total number of ways in which this can happen.The result should be the same as you got in Exercise 14.
16.Write the first 4 terms in the binomial series expansion of
and use your result to approximate
.
