1.3: Two Realms of Statistics- Descriptive and Inferential
- Page ID
- 61505
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)- Define data
- Define descriptive statistics
- Distinguish between a sample and a population
- Define biased sample
- Introduce sample statistics and population parameters
- Define inferential statistics
Descriptive Statistics and Inferential Statistics
As we have discussed, both our everyday lives and the scientific method involve a lot of observation and experimentation. During these processes, we collect information. Data refers to the information gathered from observations, experiments, surveys, historical records, and other sources. We study data to better understand what is happening around us.
Finding patterns in raw data can be difficult, especially if we have many observations and measurements. In this course, we will learn different ways to organize, summarize, and display raw data so it is easier to understand. The numbers that are used to summarize and describe data are called descriptive statistics. To fully understand a set of data, we usually need more than one descriptive statistic.This idea will be explored further in Chapter \(2.\)
Data and descriptive statistics are very similar to each other and are sometimes confused. For each of the following claims, identify the data and any descriptive statistics. Note that sometimes the data is implied instead of given directly.
- The average score on Exam \(3\) was \(76\%\) for this statistics course last semester.
- Answer
-
The data is not specifically listed, so it is implied. Since we are talking about the average score on Exam \(3\) from last semester, the data would consist of all scores on Exam \(3\) from this course last semester. \(76\%\) provides a summary of the data so it is a descriptive statistic.
- We spent \(\$4160\) on groceries and household goods last year.
- Answer
-
Again the data is not specifically listed. Our data would include the money spent on groceries and household goods last year, which could be found on receipts, bank records, credit card statements, etc. The value of \(\$4,160\) summarizes the data so it is a descriptive statistic.
- We have four children, ages \(2,\) \(4,\) \(6,\) and \(9.\) The oldest is \(9\) years old, and the average age is \(5.25\) years.
- Answer
-
The data is the listed ages of the four children: \(\{2, 4, 6, 9\}.\) Their are two descriptive statistics given: the oldest being \(9\) (the maximum value), and the average being \(5.25\) years. Notice that descriptive statistics can be values in the original data but do not have to be.
We collect data every day to help us understand the world around us and make decisions, but most of the time our goal is to learn more than what the data directly shows. We want to make general conclusions on a larger scale using the data we have. This generalization process of using data from a smaller group to make claims about a larger group is called inferential statistics. In other words, we use information from a small group selected from a larger group to make conclusions about the large group that we cannot fully observe.
Inferential statistics is important because it is usually impossible to collect data from everyone or everything we want to study. Even when it is possible, doing so can be too expensive or time-consuming. Using inferential statistics helps us save time and money while still making accurate conclusions.
What difficulties would you face when trying to collect the necessary data in the following situation? Why would we possibly need to generalize our findings?
The National Election Commission has hired us to take a look at how U.S. citizens feel about the fairness of the voting procedures in the U.S.
- Answer
-
It would be nearly impossible to ask every U.S. citizen how they feel about the fairness of voting procedures. U.S. citizens live all over the world, and even if we had everyone's contact information, there is no guarantee they would respond to visits, emails, or phone calls. The time and cost required would be too great, and people's opinions might change before the data collection was finished. Because of this, inferential statistics are needed. We must choose a smaller group of U.S. citizens to survey and then use that information to estimate the views of the entire country.
Populations and Samples
In inferential statistics, we make inferences (conclusions) about a large group by using data from a smaller group. The fully group of people or events we want to study is called the population. A smaller group taken from that population is called the sample. We use samples to learn about the larger population.
In the previous example, the population includes all U.S. citizens, which consists of hundreds of millions of people. The people who were actually surveyed would make up our sample. Instead of asking everyone, we might survey a few thousand citizens chosen from the entire population. When choosing a sample, it is important to make sure no one group is overrepresented. For example, there would be a problem if everyone in our sample happened to be a Florida resident. A sample made up only of Floridians would not accurately represent all U.S. citizens. The same problem would arise if the sample were made up only of Republicans or only only men. When these types of situations happen, our sample is called biased, meaning it does not fairly represent the population
Inferential statistics uses math to turn information from a sample into reasonable estimates about the population. How accurate these estimates are depends on how well the sample represents the population. While it's impossible to create a perfect sample, we can reduce personal bias by using random sampling. A large, randomly chosen sample is more likely to include different types of people in the right proportions, and any bias that remains is due to chance rather than poor sampling.
What difficulties would you face when trying to collect the necessary data in the following situation? Why would we possibly need to generalize our findings? How could we construct a random sample and estimate the value of interest?
We are interested in examining the average number of math classes taken by current graduating seniors at U.S. colleges and universities during their four years in college.
- Answer
-
Our population consists of the graduating seniors throughout the country. This is still a large set since there are thousands of colleges and universities, each enrolling many students. (In \(2022,\) over \(2\) million bachelor's degrees were granted in the United States.) The cost to examine the transcript of every college senior would be too great. We must construct a sample of college seniors and then make inferences to the entire population based on what we find.
To make a sample, we might first choose some public and private colleges and universities across the United States. Then we might sample \(50\) students from each of these institutions. Suppose that the average number of math classes taken by the people in our sample was \(3.2.\) Then we might speculate that \(3.2\) approximates the number we would find if we had the resources to examine every senior in the entire population. But, we must be careful about the possibility that our sample is not a good representation of the population. What if we chose a lot of math majors or chose too many technical institutions that have heavy math requirements? Bad sampling like this would make our sample a poor representation of the population of all seniors.
Building from this example, we mentioned that over \(2\) million bachelor's degrees were awarded in the United States in \(2022\). Since this value describes the entire group, or population, it is called a parameter. We also collected data from a smaller group, or sample, and found that the average number of math classes was \(3.2\) per student. Because this value describes a sample, it is called a statistic.
In short, populations are described using parameters, while samples are described using statistics.
Identify the population and the sample, then reflect on whether the sample gives good information about the population.
- A substitute teacher wants to know how students in the class did on their last test. The teacher asks the \(10\) students sitting in the front row to state their latest test score. He concludes from their report that the class did extremely well.
- Answer
-
The population consists of all students in the class. The sample is the \(10\) students sitting in the front row. The sample is not likely to represent the whole class. Students who sit in the front row often pay closer attention and do better on tests, so their scores ay be higher than the actual class average.
- A coach is interested in how many cartwheels the average college freshman at his university can do. Eight volunteers from the freshman class stepped forward. After observing their performance, the coach concluded that college freshmen can do an average of \(16\) cartwheels in a row without stopping.
- Answer
-
The population is all freshmen at the coach's university. The sample is the \(8\) students who volunteered. This is not a good sample because students who volunteered are more likely to be good at cartwheels. Freshmen who cannot do cartwheels probably did not volunteer! This would cause the coach's results to be misleading.
Determine when descriptive and inferential statistics are being used, and think about how good the conclusion is.
A quick Google Maps search showed that there were \(20\) Chick-fil-A restaurants open in Kansas in May of \(2024.\) Based on this, someone claims that there must be about \(20\) per state for a total of \(1000\) Chick-fil-A restaurants in the United States.
- Answer
-
The goal is to estimate how many Chick-fil-A restaurants there are in the United States. To do this, Kansas is used as a sample. Stating that Kansas has 20 Chick-fil-A locations is an example of descriptive statistics, because it describes the data that was collected. Estimating that there are \(1000\) Chick-fil-A restaurants nationwide is an example of inferential statistics, because it uses sample data to make a guess about a larger group.
However, this conclusion is not very reliable. Kansas has a smaller population than many other states, so it may not represent the country well. A better approach would be to look at several randomly chosen states. In this case, inferential statistics are not even necessary, since the actual number of Chic-fil-A restaurants is publicly available and is over \(3000\).

