1.3: Descriptive Statistics
- Page ID
- 259
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Learning Objectives
- Define "descriptive statistics"
- Distinguish between descriptive statistics and inferential statistics
Descriptive statistics are numbers that are used to summarize and describe data. The word "data" refers to the information that has been collected from an experiment, a survey, a historical record, etc. (By the way, "data" is plural. One piece of information is called a "datum.") If we are analyzing birth certificates, for example, a descriptive statistic might be the percentage of certificates issued in New York State, or the average age of the mother. Any other number we choose to compute also counts as a descriptive statistic for the data from which the statistic is computed. Several descriptive statistics are often used at one time, to give a full picture of the data.
Descriptive statistics are just descriptive. They do not involve generalizing beyond the data at hand. Generalizing from our data to another set of cases is the business of here to see how much individuals with other occupations earn.)
inferential statistics, which you'll be studying in another Section. Here we focus on (mere) descriptive statistics. Some descriptive statistics are shown in Table \(\PageIndex{1}\). The table shows the average salaries for various occupations in the United States in \(1999\). (ClickSalary | Occupation |
---|---|
$112,760 | pediatricians |
$106,130 | dentists |
$100,090 | podiatrists |
$ 76,140 | physicists |
$ 53,410 | architects |
$ 49,720 | school, clinical, and counseling psychologists |
$ 47,910 | flight attendants |
$ 39,560 | elementary school teachers |
$ 38,710 | police officers |
$ 18,980 | floral designers |
Descriptive statistics like these offer insight into American society. It is interesting to note, for example, that we pay the people who educate our children and who protect our citizens a great deal less than we pay people who take care of our feet or our teeth.
For more descriptive statistics, consider Table \(\PageIndex{2}\) which shows the number of unmarried men per \(100\) unmarried women in U.S. Metro Areas in \(1990\). From this table we see that men outnumber women most in Jacksonville, NC, and women outnumber men most in Sarasota, FL. You can see that descriptive statistics can be useful if we are looking for an opposite-sex partner! (These data come from the Information Please Almanac.)
Cities with mostly men | Men per 100 Women | Cities with mostly women | Men per 100 Women |
---|---|---|---|
1. Jacksonville, NC |
224
|
1. Sarasota, FL |
66
|
2. Killeen-Temple, TX |
123
|
2. Bradenton, FL |
68
|
3. Fayetteville, NC |
118
|
3. Altoona, PA |
69
|
4. Brazoria, TX |
117
|
4. Springfield, IL |
70
|
5. Lawton, OK |
116
|
5. Jacksonville, TN |
70
|
6. State College, PA |
113
|
6. Gadsden, AL |
70
|
7. Clarksville-Hopkinsville, TN-KY |
113
|
7. Wheeling, WV |
70
|
8. Anchorage, Alaska |
112
|
8. Charleston, WV |
71
|
9. Salinas-Seaside-Monterey, CA |
112
|
9. St. Joseph, MO |
71
|
10. Bryan-College Station, TX |
111
|
10. Lynchburg, VA |
71
|
NOTE: Unmarried includes never-married, widowed, and divorced persons, \(15\) years or older.
These descriptive statistics may make us ponder why the numbers are so disparate in these cities. One potential explanation, for instance, as to why there are more women in Florida than men may involve the fact that elderly individuals tend to move down to the Sarasota region and that women tend to outlive men. Thus, more women might live in Sarasota than men. However, in the absence of proper data, this is only speculation.
You probably know that descriptive statistics are central to the world of sports. Every sporting event produces numerous statistics such as the shooting percentage of players on a basketball team. For the Olympic marathon (a foot race of \(26.2\) miles), we possess data that cover more than a century of competition. (The first modern Olympics took place in \(1896\).) Table \(\PageIndex{3}\) shows the winning times for both men and women (the latter have only been allowed to compete since \(1984\)).
Women | |||
---|---|---|---|
Year | Winner | Country | Time |
1984 | Joan Benoit | USA | 2:24:52 |
1988 | Rosa Mota | POR | 2:25:40 |
1992 | Valentina Yegorova | UT | 2:32:41 |
1996 | Fatuma Roba | ETH | 2:26:05 |
2000 | Naoko Takahashi | JPN | 2:23:14 |
2004 | Mizuki Noguchi | JPN | 2:26:20 |
Men | |||
Year | Winner | Country | Time |
1896 | Spiridon Louis | GRE | 2:58:50 |
1900 | Michel Theato | FRA | 2:59:45 |
1904 | Thomas Hicks | USA | 3:28:53 |
1906 | Billy Sherring | CAN | 2:51:23 |
1908 | Johnny Hayes | USA | 2:55:18 |
1912 | Kenneth McArthur | S. Afr. | 2:36:54 |
1920 | Hannes Kolehmainen | FIN | 2:32:35 |
1924 | Albin Stenroos | FIN | 2:41:22 |
1928 | Boughra El Ouafi | FRA | 2:32:57 |
1932 | Juan Carlos Zabala | ARG | 2:31:36 |
1936 | Sohn Kee-Chung | JPN | 2:29:19 |
1948 | Delfo Cabrera | ARG | 2:34:51 |
1952 | Emil Ztopek | CZE | 2:23:03 |
1956 | Alain Mimoun | FRA | 2:25:00 |
1960 | Abebe Bikila | ETH | 2:15:16 |
1964 | Abebe Bikila | ETH | 2:12:11 |
1968 | Mamo Wolde | ETH | 2:20:26 |
1972 | Frank Shorter | USA | 2:12:19 |
1976 | Waldemar Cierpinski | E.Ger | 2:09:55 |
1980 | Waldemar Cierpinski | E.Ger | 2:11:03 |
1984 | Carlos Lopes | POR | 2:09:21 |
1988 | Gelindo Bordin | ITA | 2:10:32 |
1992 | Hwang Young-Cho | S. Kor | 2:13:23 |
1996 | Josia Thugwane | S. Afr. | 2:12:36 |
2000 | Gezahenge Abera | ETH | 2:10.10 |
2004 | Stefano Baldini | ITA | 2:10:55 |
There are many descriptive statistics that we can compute from the data in the table. To gain insight into the improvement in speed over the years, let us divide the men's times into two pieces, namely, the first \(13\) races (up to \(1952\)) and the second \(13\) (starting from \(1956\)). The mean winning time for the first \(13\) races is \(2\) hours, \(44\) minutes, and \(22\) seconds (written \(2:44:22\)). The mean winning time for the second \(13\) races is \(2:13:18\). This is quite a difference (over half an hour). Does this prove that the fastest men are running faster? Or is the difference just due to chance, no more than what often emerges from chance differences in performance from year to year? We can't answer this question with descriptive statistics alone. All we can affirm is that the two means are "suggestive."
Examining Table 3 leads to many other questions. We note that Takahashi (the lead female runner in \(2000\)) would have beaten the male runner in \(1956\) and all male runners in the first \(12\) marathons. This fact leads us to ask whether the gender gap will close or remain constant. When we look at the times within each gender, we also wonder how much they will decrease (if at all) in the next century of the Olympics. Might we one day witness a sub-\(2\) hour marathon? The study of statistics can help you make reasonable guesses about the answers to these questions.
- Mikki Hebl