8.5: More Complicated Sampling Methods
- Page ID
- 64180
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)While simple random sampling is both powerful and useful in practice, it is not always practical to implement. The problem is that in some cases the researcher is not exactly sure who the members of the population are, and in these cases, it can be difficult to ensure that all the members have an equal chance of being included in the sample. For example, suppose that a researcher is interested in taking a simple random sample of residents in a small town. A list of residents in the town is not available. But they may have a map that details the location of each residence in the town. In theory, they could take a simple random sample of these homes and talk to the residents. But how do they decide who to speak with in each home? If they randomly choose a resident in each home, then the sample is no longer a simple random sample because some homes have fewer residents than others. What about apartments that are not included on the map? What about people who are homeless, or those who live in nonstandard housing? Without a list of residents in the town, along with their locations, implementing a simple random sample can be very difficult.
To make sampling easier, researchers can divide up a population into groups called clusters. The researcher will then take a simple random sample from these clusters. Once the clusters have been sampled, the researchers can then either take a census within the cluster if it is small enough, or they can take a simple random sample within the cluster.
A sample from a population is a cluster sample if the population is first divided into groups called clusters. A simple random sample of these clusters is then taken, and individuals within each cluster are observed either through a census or through an additional simple random sample.
The advantage of using a cluster sample is that the population can be divided into smaller and easier-to-manage groups. There are two disadvantages to using a cluster sample. The first is that the potential for statistical errors can be larger when using cluster sample. The second is that the statistical techniques used by researchers must be modified to account for the fact that this type of sampling method was used.
Cluster samples work best when each of the clusters themselves are small representatives of the population. In this way it does not matter so much if not all the clusters are observed. If the clusters are divided in such a way that the clusters represent important differences in the population, then not observing some of these clusters will likely result in a nonrepresentative sample. So, for example, dividing a city by neighborhoods may inadvertently create clusters based on race, which could result in sample that is not representative of the population.
As another example, suppose researchers are planning a cluster sample of a city that is based on census blocks. For simplicity, suppose that these census blocks are known to be approximately representative of the entire population of the city and are appropriate for using as clusters. There are twenty of these blocks numbered 1–20, as indicated in Figure \(\PageIndex{1}\). A simple random sample of the clusters chooses the blocks numbered 2, 8, 10, 11, and 17. Each block contains roughly 100 residents. Because these blocks are relatively small, the researchers can study the blocks closely and determine who the residents are in each of the blocks, obtaining enough information to take a simple random sample of five residents in each block. This creates a total sample of twenty-five residents from the city based on a cluster sample.
An alternative to cluster sampling and simple random sampling is stratified sampling. Stratified sampling may look a lot like cluster sampling at first, but there are two major differences, and the purpose of stratified sampling is somewhat different. Like cluster sampling, stratified sampling also divides the population into groups, which in this case are called strata. In contrast to cluster sampling, in which the clusters should be representative of the population, the strata should be as homogeneous as possible. Hence, when using stratified sampling it is advantageous to divide the population into groups containing individuals who are as alike as possible. The other main difference with cluster sampling comes in the second phase of the sampling process. In stratified sampling, a simple random sample is taken within each of the strata, whereas in cluster sampling, only randomly sampled clusters are used.
A sample from a population is a stratified sample if the population is first divided into groups called strata. Individuals within each stratum are observed through a simple random sample.
As with cluster sampling, the corresponding statistical methods need to be modified to take advantage of this methodology, but the resulting conclusions can often be more reliable than even with simple random sampling. The other advantage of stratified sampling is that with cluster sampling, it is often easier to take a simple random sample on smaller groups than it is for the entire group.
For example, suppose researchers are planning a stratified sample of a city. Suppose that there are five residential neighborhoods in the city that are highly segregated so that those living in each neighborhood tend be of the same race or ethnicity and economic class. These blocks are numbered 1–5, as indicated in Figure \(\PageIndex{2}\). Within these neighborhoods the researchers can determine who the residents are, obtaining enough information to take a simple random sample of five residents in each neighborhood. This creates a total sample of twenty-five residents from the city based on a stratified sample.
There are many other methods used for sampling, including combinations of cluster sampling, stratified sampling, and simple random sampling. These methods can be very effective if they are implemented with care and the corresponding statistical methods are adjusted to reflect how the data were observed. Many surveys are implemented without any specific plan on how the sample will be taken and how the sample may be related to the corresponding population. Lack of care with the sampling phase of the study can yield results that are very difficult to interpret. Furthermore, applying statistical methods to data that have not been observed according to a well understood, random sampling method can be very dangerous. As we shall see in later chapters, statistical methods allow one to assess the risk that one takes from observing a sample instead of taking a census of a population. These risk assessments will not be valid if the correct sampling method has not been used.

