In addition to regular probability, we often want to figure out how probability is affected by observing some event. For example, the NFL season is rife with possibilities. From the beginning of each season, fans start trying to figure out how likely it is that their favorite team will make the playoffs. After every game the team plays, these probabilities change based on whether they won or lost. This post won't speak to how these probabilities are updated. That's the subject for a future post on Bayesian statistics. What we *will* explore is the concept of **conditional probability**, which is the probability of seeing some event knowing that some other event has actually occurred.

Some more examples of where we might encounter such conditional probabilities:

Inveterate bridge players like my dad would keep track of cards as they got exposed in the pile, for that (and the bids) provided information about the likelihoods of what hand each player had. Such card counting and conditional probabilities (what's the likelihood of each hand, given what I have seen) is one of the (frowned upon) strategies for trying to beat the casinos in blackjack and poker (see the movie 21 for a Hollywood version of real-life card counting in casinos).

When we go to the doctor to test for a disease (say tuberculosis or HIV or even, more commonly, strep throat and flu), we get a yes or no answer. However, no test is perfect. A positive test still means we might not have the disease, and testing negative might mean we have it, though hopefully with very little likelihood. For us, the important thing to know is, if we tested positive (an observed event), what is the chance that we truly have the disease (an unobserved event).

Weather forecasting is based on conditional probabilities. When the forecast says that there is a 30% chance of rain, that probability is based on all the information that the meteorologists know up until that point. It's not just a roll of the dice (though sometimes, it feels that way).

### Conditional Probabilities

The flu season is rapidly approaching. Each of us have some probability of getting the flu, which can be naively computed as the number of cases of flu last year divided by the number of people potentially exposed to the flu virus in that same year. Let's call this probability `P(flu)`

.

If a person gets a flu vaccination, their chance of getting the flu should change. This would be denoted as `P(flu|vaccine)`

, and is read as "probability of getting the flu given you have been vaccinated." Because of the "been vaccinated" condition, this is a conditional probability.

So how do you compute a conditional probability? There is a basic equation that defines this:

```
P(A and B) = P(A | B) P(B)
```

which also tells us that

```
P(A | B) = P(A and B) / P(B)
```

`P(A and B)`

is often called the *joint* probability of A and B, and `P(A)`

and `P(B)`

are often called the *marginal* probabilities of A and B, respectively.

Adapting the equations above to our flu example,

```
P(Flu | Vaccine) = P(Flu and Vaccine) / P(Vaccine)
```

The numerator is the probability that a person gets the vaccine and the flu; the denominator is the probability that a person gets the vaccine.

Let's look at a table of hypothetical frequencies for a population:

Condition | Vaccine | No Vaccine |
---|---|---|

Flu |
A | C |

No Flu |
B | D |

So we have:

```
P(Flu|Vaccine) = P(Flu and Vaccine)/P(Vaccine)
```

Plugging in the conditions (A, B, C, & D) from our table above:

```
(A/(A+B+C+D))/((A+B)/(A+B+C+D)) = A/(A+B)
```

Next, we will swap out the the different conditions (A B C D) with numbers so that we can calculate an answer!

Condition | Vaccine | No Vaccine |
---|---|---|

Flu |
10 | 1000 |

No Flu |
3000 | 10000 |

Plugging in the numbers in our new table:

```
P(Flu|Vaccine) = P(Flu & Vaccine)/P(Vaccine)
= 10/(3010)
= 0.0033
```

So this probability is the chance of getting the flu only among those who were vaccinated. We have normalized the probability of an event (getting the flu) to the conditioning event (getting vaccinated) rather than to the entire sample space.

**Challenge Question:** According to the table above, what is the probability of getting the flu if you weren't vaccinated `P(Flu | No Vaccine)`

? What is the probability of getting the flu `P(flu)`

in general?

There is another way of looking at conditional probability. If we don't know anything about event B, P(A) is the size of the light blue circle within the entire sample space (denoted by the rectangle).

If we know that the conditioning event B has happened, the probability of the event A now becomes the ratio of the light blue section to the light and dark blue section. We can then make our sample space of interest the space where event B occurs.

Statistically Independent Events We can compare the probability of an event (A) and how it changes if we know that another event (B) has happened. How does the chance of catching flu (A) change if you're vaccinated (B)? How does a football team's chance of going to the playoffs (A) change if the quarterback is injured (B)? In both these cases, we think those chances will change.

But will the chance of the Pittsburgh Steelers beating New England Patriots (sacrilegious to some, I know) in the 4 pm game depend on the Seattle Seahawks beating the San Francisco 49ers (*caveat:* I'm from Seattle) during the same time? We think (and hope) not.

When knowledge of one event does not change the probability of another event happening, the two events are called **statistically independent.** We see a lot of things that are independent in this sense. Successive tosses of a coin are independent, or so we believe. So are successive dice rolls and slot machine plays.

Statistical independence has some mathematical consequences. It implies that

```
P(A|B) = P(A)
```

which directly implies, from the definition, that

```
P(A and B) = P(A)P(B)
```

This means that we can compute the probability of two independent events happening together by merely multiplying the individual probabilities.

**Caution:** You'll often find probabilities of joint events like this computed as the product of the individual events. However, this is only true if the assumption of statistical independence is valid. Often times, it is not, and so you must be careful interpreting such computations.

Let's do a little experiment in R. We'll toss two fair dice, just as we did in an earlier post, and see if the results of the two dice are independent. We first roll the dice 100,000 times, and then compute the joint distribution of the results of the rolls from the two dice.

```
dice <- function(no_of_rolls=1){
x <- sample(1:6, size=no_of_rolls, replace=TRUE)
y <- sample(1:6, size=no_of_rolls, replace=TRUE)
return(cbind(x,y))
}
set.seed(20485)
rolls <- as.data.frame(dice(100000))
library(plyr)
freq_table <- ddply(rolls, ~x, summarize,
y1=sum(y==1), y2=sum(y==2), y3= sum(y==3),
y4 = sum(y==4), y5=sum(y==5), y6=sum(y==6))
row.names(freq_table) <- paste0('x',1:6)
prob_table <- freq_table[,-1]/100000
prob_table
```

```
## y1 y2 y3 y4 y5 y6
## x1 0.02754 0.02886 0.02723 0.02804 0.02762 0.02820
## x2 0.02656 0.02831 0.02670 0.02803 0.02807 0.02799
## x3 0.02753 0.02876 0.02745 0.02725 0.02755 0.02751
## x4 0.02783 0.02800 0.02831 0.02715 0.02806 0.02771
## x5 0.02830 0.02836 0.02668 0.02766 0.02729 0.02849
## x6 0.02826 0.02770 0.02771 0.02747 0.02757 0.02825
```

Let's evaluate the probability that `y=1`

both with and without knowledge of `x`

. If we don't observe `x`

, that probability is:

```
> mean(rolls$y==1)
0.16602
```

If we know that `x=3`

, then the *conditional* probability that `y=1`

given `x=3`

is:

```
> mean(rolls$y[rolls$x==3]==1)
0.165793
```

These results are very close.

**Note:** R makes it very easy to do conditional probability evaluations. In R, you can restrict yourself to those observations of `y`

when `x=3`

by specifying a Boolean condition as the index of the vector, as `y[x==3]`

.

If we assumed that the results from the two dice are statistically independent, we would
have, for every pair of values *i*,*j* in 1,2,3,4,5,6:

```
P(x=i and y=j) = P(x=i)P(y=j)
```

We computed the first part earlier from `prob_table`

.

Now for the second part.

```
prob_x <- table(rolls$x)/100000
prob_y <- table(rolls$y)/100000
prob_table_indep <- outer(prob_x,prob_y,'*')
row.names(prob_table_indep) <- paste0('x',1:6)
colnames(prob_table_indep) <- paste0('y',1:6)
prob_table_indep
```

```
##
## y1 y2 y3 y4 y5 y6
## x1 0.02781 0.02847 0.02748 0.02774 0.02783 0.02816
## x2 0.02750 0.02816 0.02718 0.02743 0.02753 0.02786
## x3 0.02757 0.02823 0.02725 0.02750 0.02759 0.02792
## x4 0.02774 0.02840 0.02741 0.02767 0.02776 0.02809
## x5 0.02769 0.02835 0.02737 0.02762 0.02771 0.02804
## x6 0.02772 0.02838 0.02739 0.02765 0.02774 0.02807
```

We see that `prob_table`

and `prob_table_indep`

are quite close, indicating that the rolls of the two dice are probably independent.

### Statistically Testing Independence

One statistical test for testing independence of two frequency distributions (which means that for any two values of x and y, their joint probability is the product of the marginal probabilities) is the **Chi-squared** test. In R, this is implemented by the function `chisq.test`

.

```
chisq.test(rolls$x, rolls$y)
```

```
##
## Pearson's Chi-squared test
##
## data: rolls$x and rolls$y
## X-squared = 23.27, df = 25, p-value = 0.5618
```

We see that the p-value of this test is quite large, indicating that there is insufficient evidence to suggest that x and y are *not* independent.

**Challenge question**: If two events cannot occur together (they are mutually exclusive) can they be independent?

### Bayes' Rule

Understanding how conditional probabilities change as information is acquired is part of the central dogma of the Bayesian paradigm. That paradigm is based on Bayes' theorem, which is nothing but a theorem of conditional probabilities.

```
P(A|B) = P(B|A)P(A)/P(B)
```

This provides the mathematical framework for understanding how A affects B if we know something about how B affects A.

Rearranging this formula provides a bit more insight:

```
P(A|B)/P(A) = P(B|A)/P(B)
```

In other words, how knowledge of B changes the probability of A is the same as how knowledge of A changes the probability of B, at least as a ratio. If A and B are independent, this ratio is 1.

### Disease Testing

Suppose we have a test for the flu that is positive 90% of the time when tested on a flu patient `(P(test + | flu) = 0.9)`

, and is negative 95% of the time when tested on a healthy person `(P(test - | no flu) = 0.95)`

. We also know that the flu is affecting about 1% of the population `(P(flu)=0.01)`

. You go to the doctor and test positive. What is the chance that you truly have the flu?

You can answer this question directly using Bayes' theorem, but we'll tackle this a bit differently. We'll create a hypothetical population of 100,000 people, and see if we can figure this out.

```
flu <- sample(c('No','Yes'), size=100000, replace=TRUE, prob=c(0.99,0.01))
test <- rep(NA, 100000) # create a dummy variable first
test[flu=='No'] <- sample(c('Neg','Pos'), size=sum(flu=='No'), replace=TRUE, prob=c(0.95,0.05))
test[flu=='Yes'] <- sample(c('Neg','Pos'), size=sum(flu=='Yes'), replace=TRUE,
prob=c(0.1, 0.9))
```

In the above code we first simulate who has the flu, given on average 1% of the population gets the flu. We then find out whom among those without the flu would test positive, based on `P(test - | no flu) =0.95`

. Recall that the when considering a conditioning event, the conditioning event is considered the sample space, and so all the laws of probability hold within that space.

In particular:

```
P(test + | no flu) = 1 - P(test - | no flu) = 0.05
```

We do a similar computation for the people with flu.

The question we are asking, what is the chance that you have the flu given that you tested positive, can then be directly answered as:

```
mean(flu[test=='Pos']=='Yes')
0.1567
```

Wow! Even though the test is pretty good, the chance that we actually have the flu even if we test positive is actually pretty small. This is because the chance of actually getting the flu is pretty small in the first place. However, if we look at how much our chance of having the flu changed with a positive test, it is quite large:

```
mean(flu[test=='Pos']=='Yes')/mean(flu=='Yes')
15.3152
```

That is, the knowledge that we tested positive increased our chance of truly having the flu 15-fold! A constant issue in medicine is if we should address the absolute increase in risk (1% to 15%) or the relative risk (15-fold) when deciding on best clinical practice.

If we calculate the probability using Bayes' theorem, we get a very similar result:

```
P(flu=Yes|test+) = P(test+|flu=Yes)P(flu=Yes)/P(test+)
=P(test+|flu=Yes)P(flu=Yes)/[P(test+|flu=Yes)P(flu=Yes)+P(test+|flu=No)P(flu=No)]
= 0.9 x 0.01 / (0.9 x 0.01 + 0.05 x 0.99)
= 0.1538
```

### Conclusion

Conditional probabilities and Bayes' theorem have many everyday applications such as determining the risk of our investments, what the weather will be like this weekend, and what our medical test results mean. These concepts are central to understanding the consequences of our actions and how relationships between entities can affect outcomes. With recent increases in the amount and availability of data, understanding these concepts become essential for making informed, data-driven decisions. In this post, we reviewed how to formally look at conditional probabilities, what rules they follow, how to use those rules along with Bayes' theorem to figure out the conditional probabilities of events, and even how to "flip" them.

Below are some additional resources that you can use to continue to build on what we've covered here.

### Additional Resources:

*District Data Labs provides data science consulting and corporate training services. We work with companies and teams of all sizes, helping them make their operations more data-driven and enhancing the analytical abilities of their employees. Interested in working with us? Let us know!*

### Subscribe to the DDL Blog

Did you enjoy this post? Don't miss the next one!