Skip to main content
Statistics LibreTexts

7.1: The Purpose of Sampling

  • Page ID
    57566
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Why do we think our samples represent the population? That is the goal of statistics. We have a sample, and the results from that sample are true for the population. But how do we know that?

    One of the hallmarks of statistics is estimation. We are always estimating or guessing the truth. We want to describe the variation in height in the U. S. We want to describe the variation in salaries in a given state. We want to describe the variation of standardized test scores in a given school district. We want to describe the variation of alcohol drinking across universities in the U.S. We want to describe the variation of depression among young people in the U.S.

    In each case, we want to estimate the following population parameters: the population mean, the standard error of the mean, and the normal distribution. What is the average height in the U.S. (BTW, you may think that is an innocuous example, but I listened to a podcast about how average height between nations reflects each nation’s population’s quality of access to health care. Better health care leads to taller citizens. And America’s average height declined over the years compared to other countries. Interesting. We want to know the standard deviation or the spread or range of salaries from the average salary in the U.S. We want to know the standard error of the mean, which is to what degree we have the mean of standardized test scores that accurately reflects the test scores in a given state. We want to know the average amount of alcohol consumed across universities. We want to know the average and spread of depression severity across young people in the U.S.

    Every time we state we want to know something, we want to know the population parameter. We want to know the truth about something. But we cannot know the truth with certainty. All we can do is guess, which is called obtaining a sample statistic.

    We use sample statistics to estimate the population parameter. Parameters are characteristics of a variable’s variation that we think represent the variation of the phenomenon in the population.

    Here are some questions we want to address about the sample and the population:

    1. How do we know that our samples do represent the population?
    2. Why does everyone say we need a large sample size?
    3. Why do we want to establish the population?

    Let us start with the last question. We want to establish the population because that population is the basis for determining what observations are expected or unexpected. We can establish which observations have a high or low probability of occurring. This process helps us establish what observations and findings are significant or not significant.

    An important concept for establishing the population is that we need to know the baseline of an issue. One example of a baseline would be this question: What would occur if nothing happened? If we did nothing, what would be the level of depression among young men? Obtaining the population value is one way of establishing what things are like now, and then what if we did something? Would it change the population value? If we implemented a community outreach program encouraging young men to seek treatment for depression, could we change the level of depression? We establish the population.

    Let us address the first question: How do we know our samples represent the population? Recall that the goal is to estimate the population parameter. I consider the population to be the truth. The truth of what is really going on involves everyone. I like to think of the population as the answer key. How do we know if we have the correct result unless we have an answer key containing the correct result? We need a way of knowing if we are right or wrong. However, you may recall that it is impossible to estimate the population.

    Logistically, it is impossible to do so because you must sample everyone, everywhere, and at any time.

    Conceptually, we do not have the measures to establish a population value of many psychological constructs. We do not have a good measure of depression; although the MMPI, the Beck, and the CESD are good measures, they each measure depression from a different viewpoint. We also do not have an established protocol for determining if the measures of depression are invariant across different contexts, especially gender, racial, and ethnic groups. These problems persist when we try to measure constructs such as mindfulness, racism, social desirability, violence, and attachment. These constructs are simply harder to measure because it is hard, rightfully so, to establish a single definition of the construct and how it varies.

    Methodologically, it is difficult to obtain measurements from the population for issues that people generally do not disclose. It is hard to get a prevalence rate of sexual assault, PTSD, and ADHD because it is hard for people to come forth and provide data on these difficult issues. We don’t have a census of certain populations, such as adolescents with eating disorders. We don’t know who to ask. For these and many other reasons, it is difficult to obtain a population value for any issue.


    This page titled 7.1: The Purpose of Sampling is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by Peter Ji.