Processing math: 100%
Skip to main content
Library homepage
 

Text Color

Text Size

 

Margin Size

 

Font Type

Enable Dyslexic Font
Statistics LibreTexts

Lab 3: Parameter Estimation

( \newcommand{\kernel}{\mathrm{null}\,}\)

Objective

Explore properties of estimators and understand what makes an estimator preferred.

Definitions

  • estimator vs. estimate
  • maximum likelihood estimation: likelihood function, log-likelihood
  • method of moments estimation
  • bias, unbiased estimator
  • efficiency of estimators
  • mean square error (MSE)
  • bias-variance trade-off

Introduction

In class this week, we went over two procedures for estimating parameters: maximum likelihood estimation and method of moments. There are other methods of estimation that may exist in a given context, including using "plug-in" estimators. This begs the question of which method is best. In this lab, you will explore properties of estimators and using these properties learn what criteria we think good estimators should satisfy. Each property provides a "sniff test": an estimator that fails these just doesn't smell right.

Activities

Getting Started: Navigate to your class folder structure. Within your "Labs" folder make a subfolder called "Lab3". Next, download the lab notebook .Rmd file for this lab from Blackboard and save it in your "Lab3" folder. There are no datasets used in this lab.

Within RStudio, navigate to your "Lab3" folder via the file browser in the lower right pane and then click "More > Set as working directory". Get set to write your observations and R commands in an R Markdown file by opening the "lab3_notebook.Rmd" file in RStudio. Remember to add your name as the author in line 3 of the document. For this lab, enter all of your commands into code chunks in the lab notebook. You can still experiment with code in an R script, if you want. To set up an R Script in RStudio, in the upper left corner click “File > New File > R script”. A new tab should open up in the upper left pane of RStudio.

 

Maximum Likelihood Estimation: Before we get into the properties, let's revisit maximum likelihood estimation.

Likelihood Function, Maximum Likelihood Estimate

Suppose X1,...,Xn represent a random sample from a probability distribution with associated parameter θ and pmf/pdf given by f(x;θ).  The likelihood function L(θ)=L(θ|x1,...,xn) gives the likelihood of θ, given the observed sample values x1,...,xn, and is calculated as follows:

L(θ)=ni=1f(xi;θ)=f(x1;θ)f(x2;θ)f(xn;θ)

The maximum likelihood estimate (MLE) ˆθMLE is a value of θ that maximizes the likelihood function, or equivalently that maximizes the log-likelihood function: lnL(θ).

So, the likelihood function L(θ) is a function of the unknown parameter θ, and we estimate θ by maximizing L(θ). Remember that we can use calculus to find the value of θ that maximizes the likelihood by setting the derivative equal to 0, L(θ)=0, and then solving for θ to find the MLE.

In practice, we usually maximize the log-likelihood, because taking the logarithm of a product results in a sum:

lnL(θ))= ln [f(x1;θ)f(x2;θ)...f(xn;θ)]

=lnf(x1;θ))+ ln(f(x2;θ)+...+ lnf(xn;θ))=ni=1 ln(f(xi;θ))

In most cases, we are able to find a closed-form expression for the MLE. However, this is not always possible, as the following example demonstrates.

 

Example: Suppose X1,...,Xn are a random sample from the Cauchy distribution, which has pdf given by f(x;θ)=1π(1+(xθ)2, for x, θR.  The likelihood function for θ is

L(θ)=1πnni=1[1+(Xiθ)2].             (1)

Thus, L(θ) will be maximized when ni=1[1+(Xiθ)2] is minimized, or equivalently when ni=1 ln(1+(Xiθ)2) is minimized.  The value of θ that minimizes this expression must be determined by numerical methods.

_____________________________________________________

Pause for Reflection #1:

  1. On a separate piece of paper, write out the details for deriving the likelihood function L(θ) in Equation (1).
  2. Next, explain why L(θ) will be maximized when the expression ni=1 ln(1+(Xiθ)2) is minimized. Type your response directly into your lab notebook in RStudio.
  3. Finally, on the same piece of paper you worked out step 1, take the derivative (with respect to θ) of the sum expression given in step 2 to see why the mathematical approach of setting the derivative equal to 0 will not work in this example.

Take a picture of your written work for steps 1 and 3 and upload it to your Lab3 folder in order to include in your lab notebook.

_____________________________________________________

Continuing with the Cauchy distribution, suppose you make the following observations for a random sample of size four: x1=1,x2=2,x3=2, and x4=3.  Then, to maximize L(θ), you need to minimize


ln(1+(1θ)2)+ln(1+(2θ)2)+ln(1+(2θ)2)+ln(1+(3θ)2).

Since we cannot find the minimum of the above analytically, you will use the optimize() function in R. Type the following in a code chunk in your Lab 3 notebook and run each line:

x = c(1, 2, 2, 3)
g = function(theta) sum(log(1 + (x-theta)^2))
optimize(g, interval = c(0,4))

_____________________________________________________

Pause for Reflection #2

  1. In your lab notebook, describe what each of the three lines of code above are doing. You may find it helpful to type ?c, help("function"), and ?optimize one at a time in the console window to pull up info in the Help window (lower right pane) for each of these commands.
  2. Based on the results of this code, what is the maximum likelihood estimate of θ based on the given data?

_____________________________________________________

Unbiasedness: The first property of estimators that we will consider is bias. An estimator ˆθ is biased if, on average, it tends to be too high or too low, relative to the true value of θ. Formally, this is defined using expected values:

Definition 3.1

The bias of an estimator ˆθ is given by

Bias(ˆθ)= E[ˆθ]θ.

In other words, Definition 3.1 states that a statistic used to estimate a parameter is biased when the mean of its sampling distribution is not equal to the true value of the parameter. We will explore sampling distributions more in depth in next week. For now, we will use R to approximate sampling distributions.

We like an estimator to be, on average, equal to the parameter it is estimating. That is, we like estimators that are unbiased, or equivalently, Bias(ˆθ) = 0. You will show in the homework that the sample mean is always an unbiased estimator of the population mean μ. It can also be shown that the sample proportion is also an unbiased estimator of the population proportion.

The case of the sample variance is less straightforward. Given a sample of values x1,x2,,xn, the "plug-in" estimator of the population variance σ2 is

ˆσ2=1nni=1(xiˉx)2.

However, in Lab 1, we defined the sample variance as

s2=1n1ni=1(xiˉx)2,

which is computed in R using the function var(). Notice the difference between the two estimators, namely, the division by "n" versus "n1". It turns out that the plug-in estimator ˆσ2 is biased, but the sample variance s2 is unbiased.

Let's explore this in the context of the standard normal distribution: N(μ=0,σ=1). In this context, we know the value of the parameter we are estimating, namely σ2=1. So we know that in order for an estimator of σ2 to be unbiased, its expected value needs to equal 1. You will run a simulation in R to see how the two estimators, ˆσ2 and s2, perform. With the following code (already added to the lab notebook for you), you will draw random samples of size 15 from N(0,1). For each sample, you will compute ˆσ2 and s2 and record the values. You will repeat this 1000 times.

sample.var = numeric(1000)         # object to store sample variances
plugin = numeric(1000)             # object to store plug-in estimates
n = 15                             # set sample size
for (i in 1:1000)
{
  x = rnorm(n)                     # draw a random sample of size n from N(0,1) population
  sample.var[i] = var(x)           # compute and store sample variance of ith sample
  plugin[i] = ((n-1)/n)*var(x)     # compute and store plug-in estimate from ith sample
}

We can now investigate the results of the simulation by finding the mean for the estimates of σ2 based on the two estimators ˆσ2 (plugin.var) and s2 (sample.var).  We can also visualize the results using histograms.

_____________________________________________________

Pause for Reflection #3

  1. In your lab notebook, calculate the respective means for the 1000 samples of ˆσ2 and s2 you found with the simulation.
  2. Code has been provided in your lab notebook to create histograms of the simulated values for ˆσ2 and s2. Run the code chunk and answer the following: Do the results you found support the claim that ˆσ2 is a biased estimator of σ2 and s2 is unbiased? Why or why not?

_____________________________________________________

 

Efficiency: What happens when you have two estimators that are both unbiased? Which one should you use? The next property we consider, efficiency, provides a criterion for comparing unbiased estimators that depends on their variance.

Definition 3.2

If ˆθ1 and ˆθ2 are both unbiased estimators of θ and Var(ˆθ1)<Var(ˆθ2), then ˆθ1 is said to be more efficient than ˆθ2.

We will again explore this property with a simulation, this time in the context of the uniform distribution on the closed interval [0,β].

At the start of lab, we will find the method of moments estimator for β to be ˆβ1=2ˉX, i.e., twice the mean of a given sample. It can be shown that this estimator is unbiased (a fact we will prove later). Using maximum likelihood estimation, we can find another unbiased estimator of β given by ˆβ2=((n+1)/n)Xmax, where Xmax denotes the largest value in a random sample (it is referred to as the largest order statistic).

Use the following code (already provided in your lab notebook) to run a simulation to see how these two estimators perform in the specific context of drawing random samples of size 25 from uniform [0,12].

beta.1hat = numeric(1000)
beta.2hat = numeric(1000)
for (i in 1:1000)
{
  x = runif(25, 0, 12)            # draw a random sample of size 25 from uniform[0,12]
  beta.1hat[i] = 2 * mean(x)
  beta.2hat[i] = ((25 + 1)/25) * max(x)
}
# descriptive statistics
mean(beta.1hat)
sd(beta.1hat)
mean(beta.2hat)
sd(beta.2hat)
# graphical comparison
hist(beta.1hat, xlim = c(8,16), ylim = c(0,650), xlab = "2*mean")
hist(beta.2hat, xlim = c(8,16), ylim = c(0,650), xlab = "((25+1)/25)*max")

_____________________________________________________

Pause for Reflection #4

  1. Do the results support the claim that both ˆβ1 (beta.1hat) and ˆβ2 (beta.2hat) are unbiased estimators for β? Why or why not?
  2. Which estimator is more efficient, i.e., which estimator exhibits a smaller amount of variability?
  3. Given these results, which estimator do you think you should use?

_____________________________________________________

 

Mean Square Error: The final criterion we consider combines both bias and variance. This is useful for comparing estimators that are not both unbiased. We may prefer an estimator with small bias and small variance over one that is unbiased but with large variance. The following definition provides a way to quantify the preference.

Definition 3.3

The mean square error (MSE) of an estimator is MSE(ˆθ)= E[(ˆθθ)2].

MSE measures the average squared difference between the estimator and the parameter; it takes both the variability and bias of the estimator into account, as the following proposition shows.

Proposition 3.1

MSE(ˆθ)= Var(ˆθ)+ [Bias(ˆθ)]2

It follows from Proposition 3.1 that if ˆθ is unbiased, then MSE(ˆθ) = Var(ˆθ). So, for unbiased estimators, one is more efficient than a second if and only if its MSE is smaller. But, in general, when comparing two estimators ˆθ1 and ˆθ2 of θ, we are faced with a trade-off between variability and bias.

Example: Let's explore this bias-variance trade-off in the context of the binomial distribution, where the number of trials $n$ is known but the probability of "success" p is unknown. Let Xbinomial(n,p). The sample proportion X/n (the proportion of "successes" in n observed trials) is an unbiased estimator of p. Denote this estimator as ˆp1, then


E[ˆp1] = E[Xn]=npn=p.

Furthermore, the mean square error of the sample proportion is

MSE[ˆp1] = Var(ˆp1) = Var(Xn)=np(1p)n2=p(1p)n.

Consider the alternative estimator of p given by

ˆp2=X+1n+2,

which adds one artificial success and one failure to the real data.

_____________________________________________________

Pause for Reflection #5

  1. On a piece of paper, write out the details to derive the following:

    E[ˆp2]=np+1n+2 and Var(ˆp2)=np(1p)(n+2)2.

  2. Then, using Proposition 3.1, show that the mean square error for ˆp2 is given by

    MSE(ˆp2)=np(1p)+(12p)2(n+2)2.

Take a picture of your written work for steps 1 and 2 and upload it to your Lab3 folder to include in your lab notebook. Refer to the code provided in the lab notebook for Reflection #1.

_____________________________________________________

We can compare the two estimators ˆp1 and ˆp2 for p by comparing their mean squared errors. Note that we have the MSE for both estimators as a function of p. Thus, we can graphically compare the MSE for ˆp1 and ˆp2 by plotting curves in R using the following code. Note that we use n=16 just to have a specific example to work with.

n = 16
curve(x*(1-x)/n, from=0, to=1, xlab="p", ylab="MSE")
curve((n*x*(1-x)+(1-2*x)^2)/(n+2)^2, add=TRUE, col="blue", lty=2)

The MSE for ˆp1 is in solid black, and the MSE for ˆp2 is the dashed blue curve.

_____________________________________________________

Pause for Reflection #6

Inspect the graphs of the MSE curves and answer the following:

  1. For approximately what values of p does ˆp2 have smaller MSE than ˆp1?
  2. For the values identified in step 1, even though ˆp2 is biased, it has a smaller MSE than ˆp1. Comment on why ˆp2 may be preferred over ˆp1 as an estimator of p for these values.
  3. Now alter the code above to recreate the MSE graphs for the following sample sizes: n=30,n=50,n=100,n=200. What do you see is the effect of increasing the sample size?

_____________________________________________________

*Optional* Reflection #7

Prove Proposition 3.1.

_____________________________________________________

 


Lab 3: Parameter Estimation is shared under a not declared license and was authored, remixed, and/or curated by LibreTexts.

Support Center

How can we help?