1.1: What is Statistics?
- Page ID
- 45164
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)↵
- Define key statistical terms such as population, sample, parameter, and statistic.
- Distinguish between quantitative and qualitative data.
- Differentiate between continuous and discrete variables.
- Understand the four levels of measurement: nominal, ordinal, interval, and ratio.
Introduction
You are exposed to statistics regularly. If you are a sports fan, then you have the statistics for your favorite player. If you are interested in politics, then you look at the polls to see how people feel about certain issues or candidates. If you are an environmentalist, then you research arsenic levels in the water of a town or analyze global temperatures. If you are in the business profession, then you may track the monthly sales of a store or use quality control processes to monitor the number of defective parts manufactured. If you are in the health profession, then you may look at how successful a procedure is or the percentage of people infected with a disease. There are many other examples from other areas. To understand how to collect data and analyze it, you need to understand what the field of statistics is and the basic definitions.
Statistics is the study of how to collect, organize, analyze, and interpret data collected from a group.
Exploring Descriptive and Inferential Statistics
There are two branches of statistics. One is called descriptive statistics, which is where you collect and organize data. The other is called inferential statistics, which is where you analyze and interpret data. First, you need to look at descriptive statistics since you will use the descriptive statistics when making inferences.
To understand how to create descriptive statistics and then conduct inferences, there are a few definitions that you need to look at. Note, many of the words that are defined have common definitions that are used in non-statistical terminology. In statistics, some have slightly different definitions. You must notice the difference and utilize the statistical definitions.
The first thing to decide in a statistical study is who you want to measure and what you want to measure. You always want to make sure that you can answer the question of whom you measured and what you measured. The who is known as the individual, and the what is the variable.
Individual – a person or object that you are interested in finding out information about.
Variable – the measurement or observation of the individual.
If you put the individual and the variable into one statement, then you obtain a population.
Population – set of all values of the variable for the entire group of individuals.
Notice, the population answers who you want to measure and what you want to measure. Make sure that your population always answers both of these questions. If it doesn’t, then you haven’t given someone who is reading your study the entire picture. As an example, if you just say that you are going to collect data from the senators in the U.S. Congress, you haven’t told your reader what you are going to collect. Do you want to know their income, their highest degree earned, their voting record, their age, their political party, their gender, their marital status, or how they feel about a particular issue? Without telling what you want to measure, your reader has no idea what your study is actually about.
Sometimes the population is very easy to collect. Such as if you are interested in finding the average age of all of the current senators in the U.S. Congress, there are only 100 senators. This wouldn’t be hard to find. However, if instead you were interested in knowing the average age that which a senator in the U.S. Congress first took office for all senators that ever served in the U.S. Congress, then this would be a bit more work. It is still doable, but it would take a bit of time to collect. But what if you are interested in finding the average diameter of breast height of all of the Ponderosa Pine trees in the Coconino National Forest? This would be impossible to actually collect. What do you do in these cases? Instead of collecting the entire population, you take a smaller group of the population, kind of a snapshot of the population. This smaller group is called a sample.
Sample – a subset from the population. It looks just like the population, but contains less data. Also, everybody has the same chance of being picked in the sample.
How you collect your sample can determine how accurate the results of your study are. There are many ways to collect samples. Some of them create better samples than others. No sampling method is perfect, but some are better than others. Sampling techniques will be discussed later. For now, realize that every time you take a sample, you will find different data values. The sample should mirror the population as much as possible. The sample can have different variations based on the sampling process. For example, assume we are trying to measure the average age of students in the classroom. If we collect two different samples of size 5, then most likely the average age will be different for both groups.
Once you have your data, either from a population or a sample, you need to know how you want to summarize the data. As an example, suppose you are interested in finding the proportion of people who like a candidate, the average height a plant grows to using a new fertilizer, or the variability of the test scores. Understanding how you want to summarize the data helps to determine the type of data you want to collect. Since the population is what we are interested in, you want to calculate a number from the population. This is known as a parameter. As mentioned already, you can’t collect the entire population. Even though this is the number you are interested in, you can’t really calculate it. Instead, you use the number calculated from the sample, called a statistic, to estimate the parameter. Since no sample is exactly the same, the statistical values are going to be different from sample to sample. They estimate the value of the parameter, but again, you do not know for sure if your answer is correct.
Numeric Summaries
Parameter – a number calculated from the population. Usually denoted with a Greek letter. This number is a fixed, unknown number that you want to find.
Statistic – a number calculated from the sample. Usually denoted with letters from the Latin alphabet, though sometimes there is a Greek letter with a ^ (called a hat) above it. Since you can find samples, it is readily known, though it changes depending on the sample taken. It is used to estimate the parameter value.
One last concept to mention is that there are two different types of variables – qualitative and quantitative. Each type of variable has different parameters and statistics that you find. It is important to know the difference between them.
Types of Data Variables
Qualitative or categorical variable – answer is a word or name that describes a quality of the individual.
Quantitative or numerical variable – answer is a number, something that can be counted or measured from the individual.
There are different types of quantitative variables, called discrete or continuous. The difference is in how many values the data can have. If you can count the number of data values (even if you are counting to infinity), then the variable is called discrete. If it is not possible to count the number of data values, then the variable is called continuous.
Different Types of Quantitative Variables
Discrete data can only take on particular values, like whole numbers. Discrete data are usually things that can be counted with specific values.
Continuous data can take on any value. Continuous data are usually things you measure.
Classify the quantitative variable as discrete or continuous.
- The weight of a cat.
- The number of fleas on a cat.
- The size of a shoe.
Solution
- This is continuous since it is something you measure.
- This is discrete since it is something you count.
- This is discrete since you can only be certain values, such as \(7, 7.5, 8, 8.5, 9\). You can't buy a \(9.73\) shoe.
There are also are four measurement scales for different types of data, with each building on the ones below it. They are:
Measurement of Scales
Nominal – data is just a name or category. There is no order to any data, and since there are no numbers, you cannot do any arithmetic on this level of data. Examples of this are gender, car name, ethnicity, and race.
Ordinal – data that is nominal, but you can now put the data in order, since one value is more or less than another value. You cannot do arithmetic on this data, but you can now put data values in order. Examples of this are grades (A, B, C, D, F), place value in a race (1st place, 2nd place, 3rd place), and size of a drink (small, medium, large). Also, it is not measurable.
Interval – data that is ordinal, but you can now subtract one value from another, and that subtraction makes sense. You can do arithmetic on this data, but only addition and subtraction. Examples of this are temperature and time on a clock. The zero value is a marker and not an absence of measurement.
Ratio – data that is interval, but you can now divide one value by another, and that ratio makes sense. You can now do all the arithmetic on this data. Examples of this are height, weight, distance, and time. The zero value is an absence of measurement.
Nominal and ordinal data come from qualitative variables. Interval and ratio data come from quantitative variables.
Most people have a hard time deciding if the data are nominal, ordinal, interval, or ratio. First, if the variable is qualitative (words instead of numbers), then it is either nominal or ordinal. Now, ask yourself if you can put the data in a particular order. If you can, it is ordinal. Otherwise, it is nominal. If the variable is quantitative (numbers), then it is either interval or ratio. For ratio data, a value of \(0\) means there is no measurement. This is known as the absolute zero. If there is an absolute zero in the data, then it means it is a ratio. If there is no absolute zero, then the data are interval. An example of an absolute zero is if you have $\(0\) in your bank account, then you have without money. The amount of money in your bank account is ratio data. Word of caution, sometimes ordinal data is displayed using numbers, such as \(5\) being strongly agree, and \(1\) being strongly disagree. These numbers are not numbers. Instead, they are used to assign numerical values to ordinal data. In reality, you should not perform any computations on this data, though many people do. If there are numbers, make sure the numbers are inherent, and not numbers that were assigned.
Examples of Measurement Scales
State which measurement scale is nominal, ordinal, interval, or ratio.
- Time of first class
- Hair color
- Length of time to take a test
- Age groupings (baby, toddler, adolescent, teenager, adult, elderly)
Solution
- This is an interval since it is a number, but \(0\) o'clock means midnight and not the absence of time.
- This is nominal since it is not a number, and there is no specific order for hair color.
- This is a ratio since it is a number, and if you take \(0\) minutes to take a test, it means you didn't take any time to complete it.
- This is ordinal since it is not a number, but you could put the data in order from youngest to oldest or the other way around.
Examples of Descriptive Statistics
In 2010, the Pew Research Center questioned \(1500\) adults in the U.S. to estimate the proportion of the population favoring marijuana use for medical purposes. It was found that \(73\)% are in favor of using marijuana for medical purposes. State the individual, variable, population, and sample.
Solution
Individual – a U.S. adult
Variable – the response to the question “Should marijuana be used for medical purposes?” This is qualitative data since you are recording a person’s response – yes or no.
Population – set of all responses of adults in the U.S.
Sample – set of 1500 responses of U.S. adults who were questioned.
Parameter – proportion of those who favor marijuana for medical purposes calculated from the population
Statistic– proportion of those who favor marijuana for medical purposes calculated from the sample
A parking control officer records the manufacturer of every \(5^{th}\) car in the college parking lot to guess the most common manufacturer.
Solution
Individual – a car in the college parking lot.
Variable – the name of the manufacturer. This is qualitative data since you are recording a car type.
Population – All the cars in the parking lot are based on the manufacturer.
Sample – Some of the cars in the parking lot are based on the manufacturer.
Parameter – proportion of each car type calculated from the population.
Statistic – proportion of each car type calculated from the sample.
A biologist wants to estimate the average height of a plant that is given a new plant food. She gives \(10\) plants the new plant food. State the individual, variable, population, and sample.
Solution
Individual – a plant given the new plant food
Variable – the height of the plant (Note: it is not the average height since you cannot measure an average – it is calculated from data.) This is quantitative data since you will have a number.
Population – set of all possible heights of plants, provided they were given the new plant food
Sample – set of \(10\) heights of plants when the new plant food is used
Parameter – average height of all plants
Statistic – average height of \(10\) plants
A doctor wants to see if a new treatment for cancer extends the life expectancy of a patient versus the old treatment. She gives one group of \(25\) cancer patients the new treatment and another group of \(25\) the old treatment. She then measures the life expectancy of each of the patients. State the individuals, variables, populations, and samples.
Solution
In this example there are two individuals, two variables, two populations, and two samples.
Individual 1: Cancer patient given new treatment
Individual 2: Cancer patient given old treatment
Variable 1: life expectancy when given a new treatment. This is quantitative data since you will have a number.
Variable 2: life expectancy when given the old treatment. This is quantitative data since you will have a number.
Population 1: set of all life expectancies of cancer patients given a new treatment
Population 2: set of all life expectancies of cancer patients given the old treatment
Sample 1: set of \(25\) life expectancies of cancer patients given new treatment
Sample 2: set of \(25\) life expectancies of cancer patients given old treatment
Parameter 1 – average life expectancy of all cancer patients given a new treatment
Parameter 2 – average life expectancy of all cancer patients given the old treatment
Statistic 1 – average life expectancy of \(25\) cancer patients given new treatment
Statistic 2 – average life expectancy of \(25\) cancer patients given old treatment
Authors
"1.1: What is Statistics?" by Toros Berberyan, Tracy Nguyen, and Alfie Swan is licensed under CC BY-SA 4.0
Attributions
"1.1: What is Statistics?" by Kathryn Kozak is licensed under CC BY-SA 4.0


