1.3: Populations, Samples, and Sampling Methods
- Page ID
- 58856
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Populations, Samples, and Sampling Methods
In many cases, we’re interested in learning something about a large group all the voters in a country, all the trees in a national forest, or all the customers who bought a certain product last year. But collecting data from every member of a big group isn’t always realistic. Instead, we collect information from a subset and use that data to draw conclusions about the larger group.
Populations and Samples
In statistics, we use precise terms to talk about who or what we’re studying:
- Population: The entire group we want to understand or describe.
- Sample: A smaller group selected from the population that we actually observe or measure.
The population might be enormous (e.g., all registered voters in a state), while the sample might include only a few hundred of them. If the sample is chosen well, insights from the sample can be extended to the entire population.
Example: Tree Study Revisited
Let’s return to our earlier tree health project in Central City Park:
- Population: All mature trees in Central City Park
- Sample: The 200 trees the researchers chose to monitor from 2013–2023
It would be difficult and costly to regularly check every single tree in the park. Instead, researchers track a smaller number of trees through time. Their goal is to learn something about the entire forest by closely watching just a portion of it.
Why the Sample Matters
For conclusions drawn from a sample to be useful for the population, the sample must be representative. This means:
- The sample includes a mix of individuals or objects that resemble the full group in key ways
- The method used to choose those individuals is fair and not systematically biased
If the sample is too small or poorly chosen, it may not reflect the diversity or characteristics of the population and conclusions based on that sample can be misleading or just plain wrong.
Example: A Non-Representative Sample
Suppose our tree study only included trees located near park entrances because they were easier to reach. If these areas receive more pollution or foot traffic, then our sample may overestimate tree health problems across the whole park. We’d be studying a biased slice of the forest.
Alternatively, suppose that researchers chose only to measure smaller trees because it was easier to check their leaves and canopy sizes. This would not be representative of the diversity of trees in the park.
The Role of Randomness
To guard against bias and help ensure a sample is representative, statisticians often rely on random sampling.
A random sample is one where each member of the population has a known and typically equal chance of being included. This helps eliminate systematic overrepresentation or underrepresentation of certain types of individuals.
Although random selection doesn’t guarantee a perfect sample, it does help control for unseen influences and makes it more reasonable to generalize from the sample to the population.
Common Sampling Methods
There are several techniques used to select a sample. Each has strengths and weaknesses depending on the goal and constraints of a study.
- Simple Random Sample (SRS): Every individual in the population has an equal chance of being selected. Often done via random numbers or software.
- Stratified Sample: The population is divided into meaningful subgroups (strata), and a random sample is taken from each group. Helps ensure representation across key characteristics like species, age, or region.
- Cluster Sample: The population is divided into naturally occurring groups (such as zones, classrooms, or plots). A random selection of these entire groups is observed.
- Systematic Sample: Select every nth individual from an ordered list (e.g., every 10th tree along a trail).
- Convenience Sample: Individuals are chosen based on ease of access. This type is not random and often introduces bias. It should be used with caution.
In practice, the method chosen depends on available resources, population structure, and the goals of the study. The important thing is to avoid unintentionally favoring certain individuals or groups in the selection process. It is also possible to combine these methods into more complex sampling designs, such as starting with clusters and then stratifying them.
| Sampling Method | Pros | Cons |
|---|---|---|
| Simple Random Sample (SRS) | Every individual has an equal chance of being selected; easy to understand; minimizes bias. | Can be difficult and time-consuming to implement for large or geographically dispersed populations. |
| Stratified Sample | Ensures representation across key subgroups; increases precision if groups differ. | Requires knowledge of population strata; more complex to design and implement. |
| Cluster Sample | More practical when population is spread out; reduces cost and time. | May be less representative if clusters differ greatly from one another (higher sampling error). |
| Systematic Sample | Simple to execute and can work well if there’s no underlying pattern. | Risk of bias if there’s a hidden pattern in the data that aligns with the interval. |
| Convenience Sample | Easy, quick, and inexpensive to gather. | Often biased; not generalizable to the population; may exclude important subgroups. |
Looking Ahead
A carefully selected sample is an essential part of any good statistical study. But even a well-planned sampling method can still be vulnerable to problems especially if we’re not careful about how the data is collected or who is represented.
In the next section, we will explore the idea of bias systematic errors in data collection or sampling and learn how to spot problems that could affect the accuracy and fairness of our conclusions.
Quick Reflection
Think about a group you care about your school, your city, a community that you are a member of. How could you gather a sample to understand something about that group?
What might go wrong if your sample misses certain types of individuals?
Quick Quiz: Which Sampling Method Is Described?
Choose the sampling method utilized in the following scenarios:


