Skip to main content
Statistics LibreTexts

8.2: Motivation

  • Page ID
    64177

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    An empirical study of a problem begins with identifying a hypothesis. Once this has been done, the population corresponding to the hypothesis needs to be carefully identified, and the characteristics of the individuals that should be observed is determined. At this point, the researcher is ready to observe data. The question of how this is done can be the most difficult part of any study. Earlier we considered the nature of data, with an emphasis on what can be observed and how measurements can be taken from an individual in a population. The question considered in this chapter is how researchers decide who is observed in the population.

    To serve as a hypothetical example, consider the problem of attempting to determine if there is a difference in perceived barriers to voting between the racial groups in a small city. As a working hypothesis, the researcher may feel that populations of color may perceive larger barriers to voting than the white population in the city. With this working hypothesis in mind, the researcher must now carefully define the population of interest while at the same time refining the working hypothesis of the study.

    It is immediately apparent that there is a dynamic property to the hypothesis. Are the perceived barriers based on previous experiences, that is, participation in previous elections, or are they based on what is perceived for an upcoming election? This potential difference in viewpoint may be important to the interpretability of the conclusions of the study, and may have a fundamental effect on how the population is defined. For the sake of this example, suppose that the researcher has decided to focus on the most recent previous election. The researcher must also decide whether to focus on people who voted, attempted to vote, are registered voters, or voters who are eligible to vote whether they are registered or not. This is an important question because a person who did not attempt to vote may have done so because they perceived significant barriers to voting. Further, a person night not even register to vote if they perceive significant barriers to voting. In this case the researcher may decide to consider all voters who were registered to vote at the time of the previous election, while acknowledging that people who did not register to vote will be excluded. The hypothesis is now refined to consider the question of whether there are differences in the perceived barriers to voting for those who were registered to vote in the previous election between a specified set of racial groups in a small city. This process may continue through several more iterations until a very specific population has been identified.

    Now that this population has been identified for our example, how does the researcher observe data in the population? There are several important issues with observing data from a population that must be considered by researchers. How do researchers know who is in the population? How do they contact and communicate with the individuals in the population? What if the population is too large for them to handle? What do they do about people whom they cannot contact or who do not want to participate in the study?

    While several of these questions are considered in this chapter, the major question we will tackle is: What if the population is too large to contact everyone? Considering the example as a backdrop, suppose that the small city has around 15,000 registered voters. If the researcher could contact each of these people, and if they all decided to participate in the study, then the researcher could get a complete view of how the registered voters in the city perceived voting barriers for the election of interest.

    A few calculations will show that this task is daunting even if everyone is accessible and cooperates. If the researcher has thirty days to complete the data collection part of the study, then they would need to talk to \(15,000\div 30=500\) registered voters per day to complete the study. Suppose that if the survey takes fifteen minutes to complete, that corresponds to 125 hours of interviews per day.

    If the interviewers work in eight-hour shifts, the researcher would need to hire about sixteen interviewers to work every day for 30 days, assuming no down time between interviews. If an interviewer makes $15 per hour, the researcher would need to spend $56,250 just to pay the interviewers. This cost does not include materials, computer software and hardware, analysts, and the many other associated costs with conducting a survey.

    This small example demonstrates only one of the problems with attempting to observe everyone in a population. There is a myriad of other potential problems. People will have moved, they may not be at home when you try to contact them, and many will not want to bother with taking a survey. In the absence of observing an entire population, we are necessarily obligated to observe only some of the population and hope that what we observe is representative of what the entire population looks like. This concept is called sampling.


    This page titled 8.2: Motivation is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by .

    • Was this article helpful?