Skip to main content
Statistics LibreTexts

5.1: Point Estimates and Sampling Variability

  • Page ID
    56932
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Companies such as Pew Research frequently conduct polls as a way to understand the state of public opinion or knowledge on many topics, including politics, scientific understanding, brand recognition, and more. The ultimate goal in taking a poll is generally to use the responses to estimate the opinion or knowledge of the broader population.

    Point estimates and error

    Suppose a poll suggested the US President’s approval rating is 45%. We would consider 45% to be a of the approval rating we might see if we collected responses from the entire population. This entire-population response proportion is generally referred to as the of interest. When the parameter is a proportion, it is often denoted by \(p\), and we often refer to the sample proportion as \(\hat{p}\) (pronounced p-hat1). Unless we collect responses from every individual in the population, \(p\) remains unknown, and we use \(\hat{p}\) as our estimate of \(p\). The difference we observe from the poll versus the parameter is called the in the estimate. Generally, the error consists of two aspects: sampling error and bias.

    , sometimes called , describes how much an estimate will tend to vary from one sample to the next. For instance, the estimate from one sample might be 1% too low while in another it may be 3% too high. Much of statistics, including much of this book, is focused on understanding and quantifying sampling error, and we will find it useful to consider a sample’s size to help us quantify this error; the is often represented by the letter \(n\).

    describes a systematic tendency to over- or under-estimate the true population value. For example, if we were taking a student poll asking about support for a new college stadium, we’d probably get a biased estimate of the stadium’s level of student support by wording the question as, Do you support your school by supporting funding for the new stadium? We try to minimize bias through thoughtful data collection procedures, which were discussed in Chapter [ch_intro_to_data] and are the topic of many other books.

    Understanding the variability of a point estimate

    Suppose the proportion of American adults who support the expansion of solar energy is \(p = 0.88{}\), which is our parameter of interest.2 If we were to take a poll of 1000 American adults on this topic, the estimate would not be perfect, but how close might we expect the sample proportion in the poll would be to 88%? We want to understand, how does the sample proportion \(\hat{p}\) behave when the true population proportion is 0.88.3 Let’s find out! We can simulate responses we would get from a simple random sample of 1000 American adults, which is only possible because we know the actual support for expanding solar energy is 0.88. Here’s how we might go about constructing such a simulation:

    1. There were about 250 million American adults in 2018. On 250 million pieces of paper, write “support” on 88% of them and “not” on the other 12%.
    2. Mix up the pieces of paper and pull out 1000 pieces to represent our sample of 1000 American adults.
    3. Compute the fraction of the sample that say “support”.

    Any volunteers to conduct this simulation? Probably not. Running this simulation with 250 million pieces of paper would be time-consuming and very costly, but we can simulate it using computer code; we’ve written a short program in Figure [solarPollSimulationCodeR] in case you are curious what the computer code looks like. In this simulation, the sample gave a point estimate of \(\hat{p}_1 = 0.894\). We know the population proportion for the simulation was \(p = 0.88{}\), so we know the estimate had an error of \(0.894 - 0.88{} = \text{+0.014}\).

    # 1. Create a set of 250 million entries, where 88% of them are "support" # and 12% are "not". popsize <- 250000000 possible_entries <- c(rep("support", 0.88 * popsize), rep("not", 0.12 * popsize)) # 2. Sample 1000 entries without replacement. sampled_entries <- sample(possible_entries, size = 1000) # 3. Compute p-hat: count the number that are "support", then divide by # the sample size. sum(sampled_entries == "support") / 1000

    One simulation isn’t enough to get a great sense of the distribution of estimates we might expect in the simulation, so we should run more simulations. In a second simulation, we get \(\hat{p}_2 = 0.885\), which has an error of +0.005. In another, \(\hat{p}_3 = 0.878\) for an error of -0.002. And in another, an estimate of \(\hat{p}_4 = 0.859\) with an error of -0.021. With the help of a computer, we’ve run the simulation 10,000 times and created a histogram of the results from all 10,000 simulations in Figure [sampling_10k_prop_88p]. This distribution of sample proportions is called a . We can characterize this sampling distribution as follows:

    Center.

    The center of the distribution is \(\bar{x}_{\hat{p}} = 0.88{}0\), which is the same as the parameter. Notice that the simulation mimicked a simple random sample of the population, which is a straightforward sampling strategy that helps avoid sampling bias.

    Spread.

    The standard deviation of the distribution is \(s_{\hat{p}} = 0.010{}\). When we’re talking about a sampling distribution or the variability of a point estimate, we typically use the term rather than standard deviation, and the notation \(SE_{\hat{p}}\) is used for the standard error associated with the sample proportion.

    Shape.

    The distribution is symmetric and bell-shaped, and it resembles a normal distribution.

    These findings are encouraging! When the population proportion is \(p = 0.88{}\) and the sample size is \(n = 1000{}\), the sample proportion \(\hat{p}\) tends to give a pretty good estimate of the population proportion. We also have the interesting observation that the histogram resembles a normal distribution.

    [sampling_10k_prop_88p]

    Sampling distributions are never observed, but we keep them in mind In real-world applications, we never actually observe the sampling distribution, yet it is useful to always think of a point estimate as coming from such a hypothetical distribution. the sampling distribution will help us characterize and make sense of the point estimates that we do observe.

    If we used a much smaller sample size of \(n = 50\), would you guess that the standard error for \(\hat{p}\) would be larger or smaller than when we used \(n = 1000{}\)? [smallerSampleWhatHappensToPropErrorExercise] Intuitively, it seems like more data is better than less data, and generally that is correct! The typical error when \(p = 0.88{}\) and \(n = 50\) would be larger than the error we would expect when \(n = 1000{}\).

    Example [smallerSampleWhatHappensToPropErrorExercise] highlights an important property we will see again and again: a bigger sample tends to provide a more precise point estimate than a smaller sample.

    Central Limit Theorem

    The distribution in Figure [sampling_10k_prop_88p] looks an awful lot like a normal distribution. That is no anomaly; it is the result of a general principle called the .

    Central Limit Theorem and the success-failure condition When observations are independent and the sample size is sufficiently large, the sample proportion \(\hat{p}\) will tend to follow a normal distribution with the following mean and standard error:

    \[\begin{aligned} \mu_{\hat{p}} &= p &SE_{\hat{p}} &= \sqrt{\frac{p (1 - p)}{n}} \end{aligned}\]

    In order for the Central Limit Theorem to hold, the sample size is typically considered sufficiently large when \(np \geq 10\) and \(n(1-p) \geq 10\), which is called the .

    The Central Limit Theorem is incredibly important, and it provides a foundation for much of statistics. As we begin applying the Central Limit Theorem, be mindful of the two technical conditions: the observations must be independent, and the sample size must be sufficiently large such that \(np \geq 10\) and \(n(1-p) \geq 10\).

    Earlier we estimated the mean and standard error of \(\hat{p}\) using simulated data when \(p = 0.88{}\) and \(n = 1000{}\). Confirm that the Central Limit Theorem applies and the sampling distribution is approximately normal.[sample_p88_n1000_confirm_normal]

    Independence.

    There are \(n = 1000{}\) observations for each sample proportion \(\hat{p}\), and each of those observations are independent draws. The most common way for observations to be considered independent is if they are from a simple random sample.

    Success-failure condition.

    We can confirm the sample size is sufficiently large by checking the success-failure condition and confirming the two calculated values are greater than 10:

    \[\begin{aligned} np &= 1000{} \times 0.88{} = 880{} \geq 10 &n(1-p) &= 1000{} \times (1 - 0.88{}) = 120{} \geq 10 \end{aligned}\]

    The independence and success-failure conditions are both satisfied, so the Central Limit Theorem applies, and it’s reasonable to model \(\hat{p}\) using a normal distribution.

    How to verify sample observations are independent Subjects in an experiment are considered independent if they undergo random assignment to the treatment groups.

    If the observations are from a simple random sample, then they are independent.

    If a sample is from a seemingly random process, e.g. an occasional error on an assembly line, checking independence is more difficult. In this case, use your best judgement.

    An additional condition that is sometimes added for samples from a population is that they are no larger than 10% of the population. When the sample exceeds 10% of the population size, the methods we discuss tend to overestimate the sampling error slightly versus what we would get using more advanced methods.4 This is very rarely an issue, and when it is an issue, our methods tend to be conservative, so we consider this additional check as optional.

    Compute the theoretical mean and standard error of \(\hat{p}\) when \(p = 0.88{}\) and \(n = 1000{}\), according to the Central Limit Theorem.[sample_p88_n1000_mean_se] The mean of the \(\hat{p}\)’s is simply the population proportion: \(\mu_{\hat{p}} = 0.88{}\).

    The calculation of the standard error of \(\hat{p}\) uses the following formula:

    \[\begin{aligned} SE_{\hat{p}} = \sqrt{\frac{p (1 - p)}{n}} = \sqrt{\frac{0.88{} (1 - 0.88{})} {1000{}}} = 0.010{}\end{aligned}\]

    Estimate how frequently the sample proportion \(\hat{p}\) should be within 0.02 (2%) of the population value, \(p = 0.88{}\). Based on Examples [sample_p88_n1000_confirm_normal] and [sample_p88_n1000_mean_se], we know that the distribution is approximately \(N(\mu_{\hat{p}} = 0.88{}, SE_{\hat{p}} = 0.010{})\). [sampling_10k_prop_887p-prop_from_867_to_907] After so much practice in Section [normalDist], this normal distribution example will hopefully feel familiar! We would like to understand the fraction of \(\hat{p}\)’s between 0.86 and 0.90:

    With \(\mu_{\hat{p}} = 0.88{}\) and \(SE_{\hat{p}} = 0.010{}\), we can compute the Z-score for both the left and right cutoffs:

    \[\begin{aligned} Z_{0.86} &= \frac{0.86 - 0.88{}}{0.010{}} = -2 &Z_{0.90} &= \frac{0.90 - 0.88{}}{0.010{}} = 2\end{aligned}\]

    We can use either statistical software, a graphing calculator, or a table to find the areas to the tails, and in any case we will find that they are each 0.0228. The total tail areas are \(2 \times 0.0228 = 0.0456\), which leaves the shaded area of 0.9544. That is, about 95.44% of the sampling distribution in Figure [sampling_10k_prop_88p] is within \(\pm0.02\) of the population proportion, \(p = 0.88{}\).

    In Example [smallerSampleWhatHappensToPropErrorExercise] we discussed how a smaller sample would tend to produce a less reliable estimate. Explain how this intuition is reflected in the formula for \(SE_{\hat{p}} = \sqrt{\frac{p (1 - p)}{n}}\).

    Applying the Central Limit Theorem to a real-world setting

    We do not actually know the population proportion unless we conduct an expensive poll of all individuals in the population. Our earlier value of \(p = 0.88\) was based on poll conducted by Pew Research of 1000 American adults that found \(\hat{p} = 0.887{}\) of them favored expanding solar energy. The researchers might have wondered: does the sample proportion from the poll approximately follow a normal distribution? We can check the conditions from the Central Limit Theorem:

    Independence.

    The poll is a simple random sample of American adults, which means that the observations are independent.

    Success-failure condition.

    To check this condition, we need the population proportion, \(p\), to check if both \(np\) and \(n(1-p)\) are greater than 10. However, we do not actually know \(p\), which is exactly why the pollsters would take a sample! In cases like these, we often use \(\hat{p}\) as our next best way to check the success-failure condition:

    \[\begin{aligned} n\hat{p} &= 1000{} \times 0.887{} = 887{} &n (1 - \hat{p}) &= 1000{} \times (1 - 0.887{}) = 113{} \end{aligned}\]

    The sample proportion \(\hat{p}\) acts as a reasonable substitute for \(p\) during this check, and each value in this case is well above the minimum of 10.

    This of using \(\hat{p}\) in place of \(p\) is also useful when computing the standard error of the sample proportion:

    \[\begin{aligned} SE_{\hat{p}} = \sqrt{\frac{p (1 - p)}{n}} \approx \sqrt{\frac{\hat{p} (1 - \hat{p})}{n}} = \sqrt{\frac{0.887{} (1 - 0.887{})}{1000{}}} = 0.010{}\end{aligned}\]

    This substitution technique is sometimes referred to as the “”. In this case, \(SE_{\hat{p}}\) didn’t change enough to be detected using only 3 decimal places versus when we completed the calculation with 0.88 earlier. The computed standard error tends to be reasonably stable even when observing slightly different proportions in one sample or another.

    More details regarding the Central Limit Theorem

    We’ve applied the Central Limit Theorem in numerous examples so far this chapter:

    When observations are independent and the sample size is sufficiently large, the distribution of \(\hat{p}\) resembles a normal distribution with

    \[\begin{aligned} \mu_{\hat{p}} &= p &SE_{\hat{p}} &= \sqrt{\frac{p (1 - p)}{n}}\end{aligned}\]

    The sample size is considered sufficiently large when \(n p \geq 10\) and \(n (1 - p) \geq 10\).

    In this section, we’ll explore the success-failure condition and seek to better understand the Central Limit Theorem.

    An interesting question to answer is, what happens when \(np < 10\) or \(n(1-p) < 10\)? As we did in Section 1.2, we can simulate drawing samples of different sizes where, say, the true proportion is \(p = 0.25\). Here’s a sample of size 10:

    no, no, yes, yes, no, no, no, no, no, no

    In this sample, we observe a sample proportion of yeses of \(\hat{p} = \frac{2}{10} = 0.2\). We can simulate many such proportions to understand the sampling distribution of \(\hat{p}\) when \(n = 10\) and \(p = 0.25\), which we’ve plotted in Figure [sampling_10_prop_25p] alongside a normal distribution with the same mean and variability. These distributions have a number of important differences.

    [sampling_10_prop_25p]

    [clt_prop_grid_1]

    [clt_prop_grid_2]

    Unimodal? Smooth? Symmetric?
    Normal: \(N(0.25, 0.14)\)      
    \(n = 10\), \(p = 0.25\)      

    Notice that the success-failure condition was not satisfied when \(n = 10\) and \(p = 0.25\):

    \[\begin{aligned} n p = 10 \times 0.25 = 2.5 && n (1 - p) = 10 \times 0.75 = 7.5\end{aligned}\]

    This single sampling distribution does not show that the success-failure condition is the perfect guideline, but we have found that the guideline did correctly identify that a normal distribution might not be appropriate.

    We can complete several additional simulations, shown in Figures [clt_prop_grid_1] and [clt_prop_grid_2], and we can see some trends:

    1. When either \(np\) or \(n(1 - p)\) is small, the distribution is more , i.e. not continuous.
    2. When \(np\) or \(n(1-p)\) is smaller than 10, the skew in the distribution is more noteworthy.
    3. The larger both \(np\) and \(n(1 - p)\), the more normal the distribution. This may be a little harder to see for the larger sample size in these plots as the variability also becomes much smaller.
    4. When \(np\) and \(n(1 - p)\) are both very large, the distribution’s discreteness is hardly evident, and the distribution looks much more like a normal distribution.

    So far we’ve only focused on the skew and discreteness of the distributions. We haven’t considered how the mean and standard error of the distributions change. Take a moment to look back at the graphs, and pay attention to three things:

    1. The centers of the distribution are always at the population proportion, \(p\), that was used to generate the simulation. Because the sampling distribution of \(\hat{p}\) is always centered at the population parameter \(p\), it means the sample proportion \(\hat{p}\) is when the data are independent and drawn from such a population.
    2. For a particular population proportion \(p\), the variability in the sampling distribution decreases as the sample size \(n\) becomes larger. This will likely align with your intuition: an estimate based on a larger sample size will tend to be more accurate.
    3. For a particular sample size, the variability will be largest when \(p = 0.5\). The differences may be a little subtle, so take a close look. This reflects the role of the proportion \(p\) in the standard error formula: \(SE = \sqrt{\frac{p (1 - p)}{n}}\). The standard error is largest when \(p = 0.5\).

    At no point will the distribution of \(\hat{p}\) look perfectly normal, since \(\hat{p}\) will always take discrete values (\(x / n\)). It is always a matter of degree, and we will use the standard success-failure condition with minimums of 10 for \(np\) and \(n (1 - p)\) as our guideline within this book.

    Extending the framework for other statistics

    The strategy of using a sample statistic to estimate a parameter is quite common, and it’s a strategy that we can apply to other statistics besides a proportion. For instance, if we want to estimate the average salary for graduates from a particular college, we could survey a random sample of recent graduates; in that example, we’d be using a sample mean \(\bar{x}\) to estimate the population mean \(\mu\) for all graduates. As another example, if we want to estimate the difference in product prices for two websites, we might take a random sample of products available on both sites, check the prices on each, and then compute the average difference; this strategy certainly would give us some idea of the actual difference through a point estimate.

    While this chapter emphasizes a single proportion context, we’ll encounter many different contexts throughout this book where these methods will be applied. The principles and general ideas are the same, even if the details change a little. We’ve also sprinkled some other contexts into the exercises to help you start thinking about how the ideas generalize.


    This page titled 5.1: Point Estimates and Sampling Variability is shared under a CC BY-SA 3.0 license and was authored, remixed, and/or curated by David Diez, Christopher Barr, & Mine Çetinkaya-Rundel via source content that was edited to the style and standards of the LibreTexts platform.

    • Was this article helpful?