Skip to main content
Statistics LibreTexts

1.4: Sources of Bias and Randomness

  • Page ID
    58858
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Bias and the Importance of Randomness

    Now that we understand how we often use a sample to learn about a larger population, let’s talk about a key idea that can interfere with our conclusions: bias.

    What Is Bias?

    Bias occurs when the method used to collect data causes certain individuals or outcomes to be favored or excluded, making the sample systematically different from the population.

    Bias can lead us to incorrect conclusions about a population because the information in our sample doesn’t fairly or fully represent the entire group. While some variability is natural in any sample, biased samples introduce systematic errors that can’t be explained by random chance.

    Examples of Bias in Sampling

    • Convenience Bias: A teacher wants to know how much students are studying on campus. She surveys the students in the library at 9am. But those students might be more prepared or studious than the general student population.
    • Nonresponse Bias: A health department emails a survey on exercise habits, but only 20% of people respond. If more active people are more likely to answer, the final summary may overestimate how much exercise the population gets.
    • Undercoverage Bias: A city survey collects responses entirely via smartphones. This may exclude older adults or those with limited internet access—leaving out important parts of the population.
    • Voluntary Response Bias: An online poll asks if customers are “satisfied with service.” People who feel extremely positive or negative are more likely to respond, skewing the results.

    In each case, the problem is that the method causes misrepresentation. If we base policies or decisions on biased data, we may end up hurting the very communities we aim to serve.

    How Randomness Helps

    Random sampling is one of the most important defenses against bias.

    By using a random process to select individuals from the population, giving each one a known and fair chance of being included, we reduce the influence of personal judgment, patterns, convenience, or external factors.

    Randomness also lets us make statistical claims later, because it gives us a defensible reason to treat our sample as representative of the full population within the limits of chance.

    Example: Comparing Two Approaches

    A researcher wants to know the average commuting time for people in a city. They consider two approaches:

    • Approach A: They survey 300 people in the downtown train station at 8:30am on a weekday.
    • Approach B: They randomly select 300 residents from city voter rolls and call them at home.

    Approach A is likely to lead to bias those passing through the station during rush hour might have different commuting patterns than city residents as a whole. Some might not even live in the city. Approach B has a better chance of reaching people across neighborhoods, schedules, and lifestyles — especially if non-response is addressed properly.

    Beyond Sampling: Measurement and Design Bias

    Bias isn't limited to sampling. It can also creep in when:

    • Survey questions are worded in leading or confusing ways.
    • Instruments (like scales or tests) aren’t well-calibrated or fair.
    • Researchers influence responses consciously or unconsciously.

    We’ll talk more about these issues when we learn how to design good studies and interpret study results fairly.

    Can We Avoid Bias Completely?

    Bias is often invisible unless we actively look for it. Even with good intentions, human judgments and systems can introduce unfair patterns. While we may not eliminate all sources of bias, awareness and thoughtful design give us the best possible chance to reduce it.

    Reflection: Can You Spot the Bias?

    • A school district sends out a digital survey asking about lunch program satisfaction. Who might be unknowingly excluded?
    • A company lets employees “opt in” to a survey on job satisfaction. What kind of responses might they get?
    • A sports article collects Twitter replies about a controversial call in a playoff game. Is that representative of all fans?

    Quick Quiz: Identify the Type of Bias

    For each example below, choose the type of sampling bias that is most likely present.

    1. A news website runs an online poll asking readers if they're satisfied with government policy. Only extremely positive and negative responses pour in.


    2. A public health survey is distributed only through a smartphone app. Many older adults never see it and aren’t included in the data.


    3. A school researcher surveys students who are already standing in front of the cafeteria to get quick answers about lunch preferences.


    4. A market research firm emails a survey to 1,000 customers, but only 210 respond — most of whom are satisfied repeat buyers.


    What's Next?

    Now that we understand the importance of fair sampling and the dangers of bias, we’re ready to think about how to ask good statistical questions and design reliable studies. In the next section, we’ll learn how to plan an investigation that collects data in a way that’s focused, useful, and ethical from the start.


    This page titled 1.4: Sources of Bias and Randomness is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by Mathematics Department.

    • Was this article helpful?