7.6.1: The Birthday Problem
- Page ID
- 64115
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Suppose that it is your birthday, and your friends take you out to a restaurant to celebrate. During the evening you are brought a small cake, and perhaps your server will sing the birthday song to you. You are enjoying the evening, and then from another part of the restaurant, you hear your server singing the same song to another person. You have now realized that you share the same birth date with someone else in that restaurant!
Sharing a birthday with another person may seem surprising at first. After all, birthdays feel like very special and personal days that you would not want to share with someone else. But what is the probability that, if you gather a group of individuals together, there is a least one pair of people who have the same birthday? As we shall see below, the probability that this can occur might be higher than you think.
Before we begin thinking about computing this probability, we need to make some assumptions. To begin, we will exclude leap years for our discussion. The problem when leap years are included is that birthdays on February 29th are noticeably less common than other birthdays, and this will violate our second assumption. Next, we will assume that if we randomly select a person from the general population, then each of the possible 365 birthdays under consideration are equally likely. This assumption is not true as birth dates follow seasonal patterns (Nunnikhoven 1992). However, these assumptions will make it easier to compute the associated probability and demonstrate the method without additional complications. We will discuss the results of a more comprehensive and realistic approach at the end of this section.
To begin understanding how to compute this probability, first consider selecting two people at random. The first person selected will have a birthday that we can represent by a number \(b_1\), which is some integer between 1 and 365. Hence, 1 will represent the outcome that the person’s birthday is on January 1st, 2 represents the outcome that the person’s birthday is on January 2nd. This continues until we get to 365, which represents the outcome that the person’s birthday is on December 31st. When the second person is selected, their birthday will be represented by a number \(b_2\), with the same representation as shown above. Using the classical method for computing probabilities, the probability that \(b_1\) and \(b_2\) are the same is \(1/365\), since there are 365 equally likely possible birthdays for the second person and only one of these matches the birthday for the first person. Hence, the probability that two people having a matching birthday is \(1/365\).
Now consider three people, whose birthdays will be represented by the numbers \(b_1\), \(b_2\), and \(b_3\). In this case there are many ways in which at least a two of the individuals share a birthday. Either \(b_1=b_2\), \(b_2=b_3\), \(b_1=b_3\), or \(b_1=b_2=b_3\). To compute the probability that one of these outcomes occurs, we would need to list out all possible birthday combinations for the three people, which corresponds to \(365\times 365\times 365=48,627,125\) possibilities, so that \(n=48,627,125\). Next, we would need to count how many of these possibilities correspond to one of the conditions shown above. Because of the four possible ways that this outcome can occur, this calculation can be rather complicated. Alternatively, we could compute the probability that all three birthdays are different, which is the remaining possibilities. We can this subtract this from \(n\) to get the number \(m\) that we need to compute the probability.
Suppose that the first sample person’s birthday is \(b_1\). There are 365 possibilities for their birthday. When we sample the second person, there are 364 possibilities for their birthday to be different than the first person, that is, \(b_1\neq b_2\), for each of the 365 possibilities for \(b_1\). Therefore, there are a total of \(365\times 364=132,860\) outcomes where \(b_1\neq b_2\). When we sample the third person, there are 363 possibilities for their birthday to be different than the first two people, that is, \(b_3\neq b_1\) and \(b_3\neq b_2\) for each of the 132,860 possibilities where \(b_1\neq b_2\). Therefore, there are a total of \(132,860×363=48,228,180\) outcomes where all three birthdays are different, or \(b_1\neq b_2\neq b_3\). In all the remaining outcomes, at least two people share a birthday. Therefore, there are \(m=48,627,125−48,228,180=398,945\) outcomes where at least two people share a birthday. Hence, the probability that at least two of the three people share a birthday is
\[p=\frac{398,945}{48,627,125}=\frac{1093}{132132}\approx 0.008204, \nonumber \]
or about an 0.8% chance. Hence there is a little less that a 1% chance that at least two people will share a birthday for three randomly selected people. This is unlikely but is slightly larger than that for two people, which has a chance equal to \(1/365×100%\approx 0.3%\).
This argument can be used for any number of people, though the calculations become difficult to compute due to the large numbers involved. For four randomly selected people, the probability that at least two of them share a birthday is
\[ p = 1- \frac{365\times 364\times 363\times 362}{365\times 365\times 365\times 365} = \frac{290,299,465}{17,748,900,625} = \frac{795,341}{48,627,125} \approx 0.016356, \nonumber \]
which corresponds to about a 1.6% chance. For five randomly selected people, the probability that at least two of them share a birthday is
\[ p = 1- \frac{365\times 364\times 363\times 362\times 361}{365\times 365\times 365\times 365\times 365} = \frac{175,793,709,365}{6,478,348,728,125} = \frac{481,626,601}{17,748,900,625} \approx 0.027136, \nonumber \]
which corresponds to about a 2.7% chance.
The probabilities are increasing, and this is reasonable because the more people observed, the more chances there are for birthdays to overlap. In fact, if there are 366 people in the group, there must be at least one pair who share a birthday as there are only 365 birthdays to choose from. So, somewhere between 5 people and 366 people, the probability increases considerably. What surprises most people is that it does not take too many people before the probability gets quite large. Table 7.5 shows these probabilities for between 10 and 70 people. For example, from the table we can observe that if there are as few as 23 people in a group, the chance that there are at least two people who share a birthday is above 50%. When you get 50 people in a room together, there is a 97% chance that at least two people in the room share a birthday. That means if you are sitting in a class with at least 50 other students, it is almost certain that there are at least two people in the room who share a birthday!
Table 7.5 The probability that at least two people in a group share a birthday assuming each person is equally likely to have one of 365 birthdays in the year excluding leap days.
|
Number of People |
Probability at Least One Pair Shares a Birthday |
|
10 |
0.1169 |
|
11 |
0.1141 |
|
12 |
0.1670 |
|
13 |
0.1944 |
|
14 |
0.2231 |
|
15 |
0.2529 |
|
16 |
0.2836 |
|
17 |
0.3150 |
|
18 |
0.3469 |
|
19 |
0.3791 |
|
20 |
0.4114 |
|
21 |
0.4437 |
|
22 |
0.4757 |
|
23 |
0.5073 |
|
24 |
0.5383 |
|
25 |
0.5687 |
|
26 |
0.5982 |
|
27 |
0.6269 |
|
28 |
0.6545 |
|
29 |
0.6810 |
|
30 |
0.7063 |
|
31 |
0.7305 |
|
32 |
0.7355 |
|
33 |
0.7750 |
|
34 |
0.7953 |
|
35 |
0.8144 |
|
36 |
0.8322 |
|
37 |
0.8487 |
|
38 |
0.8641 |
|
39 |
0.8782 |
|
40 |
0.8912 |
|
50 |
0.9704 |
|
60 |
0.9941 |
|
70 |
0.9992 |
The rapid increase of this probability is sometimes called a paradox because it does not seem possible that the probability would be so large for so few people. The main thing to remember is that this is not the probability that someone has the same birthdate as yours; this is the probability that at least two people in the room have a matching birthday.
What about the assumptions that we made at the beginning of this section? As we stated earlier, the distribution of birthdays is not uniform throughout the year, and hence there are some months of the year where birthdays are more common than others. In fact, recent birthdate data (1994–2014) compiled by the U.S. Centers for Disease Control and Prevention’s National Center for Health Statistics (NCHS) and the Social Security Administration show that September 9 is the most common birth date in the United States. September is a very common month for birthdays as nine of the top ten birth dates occur in September, though the most common month for a birthday is August (Cahn 2024). How does the nonuniformity of birth dates throughout the year affect the calculations?
It turns out that the probability of at least one match increases when the distribution of birth dates is not uniform (Bloom 1973; Munford 1977; Rust 1976). This makes sense because if there are some birthdays that are more common than others, we would expect that there would be a better chance of at least two individuals having those birthdays. Some researchers have used the actual distribution of birthdays in the United States to determine how much these probabilities change. What they found is that the probabilities are relatively insensitive to the moderate variation in birthrates observed in the United States (Nunnikhoven 1992). For example, when there are 10 people in the group, the probability of at least one match is 0.116948 when birthdays are assumed to be uniformly distributed throughout the year, whereas this probability is 0.117100 when the actual distribution of birthdays is used.

