13.8: Effect Size

Last updated
Save as PDF

Page ID: 8266

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\dsum}{\displaystyle\sum\limits} \)

\( \newcommand{\dint}{\displaystyle\int\limits} \)

\( \newcommand{\dlim}{\displaystyle\lim\limits} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

The most commonly used measure of effect size for a t-test is Cohen’s d (Cohen 1988). It’s a very simple measure in principle, with quite a few wrinkles when you start digging into the details. Cohen himself defined it primarily in the context of an independent samples t-test, specifically the Student test. In that context, a natural way of defining the effect size is to divide the difference between the means by an estimate of the standard deviation. In other words, we’re looking to calculate something along the lines of this:

\(d=\dfrac{(\text { mean } 1)-(\text { mean } 2)}{\text { std dev }}\)

and he suggested a rough guide for interpreting d in Table ??. You’d think that this would be pretty unambiguous, but it’s not; largely because Cohen wasn’t too specific on what he thought should be used as the measure of the standard deviation (in his defence, he was trying to make a broader point in his book, not nitpick about tiny details). As discussed by McGrath and Meyer (2006), there are several different version in common usage, and each author tends to adopt slightly different notation. For the sake of simplicity (as opposed to accuracy) I’ll use d to refer to any statistic that you calculate from the sample, and use δ to refer to a theoretical population effect. Obviously, that does mean that there are several different things all called d. The cohensD() function in the lsr package uses the method argument to distinguish between them, so that’s what I’ll do in the text.

My suspicion is that the only time that you would want Cohen’s d is when you’re running a t-test, and if you’re using the oneSampleTTest, independentSamplesTTest and pairedSamplesTTest() functions to run your t-tests, then you don’t need to learn any new commands, because they automatically produce an estimate of Cohen’s d as part of the output. However, if you’re using t.test() then you’ll need to use the cohensD() function (also in the lsr package) to do the calculations.

d-value	rough interpretation
about 0.2	small effect
about 0.5	moderate effect
about 0.8	large effect

Cohen’s d from one sample

The simplest situation to consider is the one corresponding to a one-sample t-test. In this case, the one sample mean \(\ \bar{X}\) and one (hypothesised) population mean μ_oto compare it to. Not only that, there’s really only one sensible way to estimate the population standard deviation: we just use our usual estimate \(\ \hat{\sigma}\). Therefore, we end up with the following as the only way to calculate d,

\(d=\dfrac{\bar{X}-\mu_{0}}{\hat{\sigma}}\)

When writing the cohensD() function, I’ve made some attempt to make it work in a similar way to t.test(). As a consequence, cohensD() can calculate your effect size regardless of which type of t-test you performed. If what you want is a measure of Cohen’s d to accompany a one-sample t-test, there’s only two arguments that you need to care about. These are:

x. A numeric vector containing the sample data.
mu. The mean against which the mean of x is compared (default value is mu = 0).

We don’t need to specify what method to use, because there’s only one version of d that makes sense in this context. So, in order to compute an effect size for the data from Dr Zeppo’s class (Section 13.2), we’d type something like this:

cohensD( x = grades,    # data are stored in the grades vector
          mu = 67.5      # compare students to a mean of 67.5
 )

## [1] 0.5041691

and, just so that you can see that there’s nothing fancy going on, the command below shows you how to calculate it if there weren’t no fancypants cohensD() function available:

( mean(grades) - 67.5 ) / sd(grades)

## [1] 0.5041691

Yep, same number. Overall, then, the psychology students in Dr Zeppo’s class are achieving grades (mean = 72.3%) that are about .5 standard deviations higher than the level that you’d expect (67.5%) if they were performing at the same level as other students. Judged against Cohen’s rough guide, this is a moderate effect size.

Cohen’s d from a Student t test

The majority of discussions of Cohen’s d focus on a situation that is analogous to Student’s independent samples t test, and it’s in this context that the story becomes messier, since there are several different versions of d that you might want to use in this situation, and you can use the method argument to the cohensD() function to pick the one you want. To understand why there are multiple versions of d, it helps to take the time to write down a formula that corresponds to the true population effect size δ. It’s pretty straightforward,

\(\delta=\dfrac{\mu_{1}-\mu_{2}}{\sigma}\)

where, as usual, μ1 and μ2 are the population means corresponding to group 1 and group 2 respectively, and σ is the standard deviation (the same for both populations). The obvious way to estimate δ is to do exactly the same thing that we did in the t-test itself: use the sample means as the top line, and a pooled standard deviation estimate for the bottom line:

\(d=\dfrac{\bar{X}_{1}-\bar{X}_{2}}{\hat{\sigma}_{p}}\)

where \(\ \hat{\sigma_p}\) is the exact same pooled standard deviation measure that appears in the t-test. This is the most commonly used version of Cohen’s d when applied to the outcome of a Student t-test ,and is sometimes referred to as Hedges’ g statistic (Hedges 1981). It corresponds to method = "pooled" in the cohensD() function, and it’s the default.

However, there are other possibilities, which I’ll briefly describe. Firstly, you may have reason to want to use only one of the two groups as the basis for calculating the standard deviation. This approach (often called Glass’ Δ) only makes most sense when you have good reason to treat one of the two groups as a purer reflection of “natural variation” than the other. This can happen if, for instance, one of the two groups is a control group. If that’s what you want, then use method = "x.sd" or method = "y.sd" when using cohensD(). Secondly, recall that in the usual calculation of the pooled standard deviation we divide by N−2 to correct for the bias in the sample variance; in one version of Cohen’s d this correction is omitted. Instead, we divide by N. This version (method = "raw") makes sense primarily when you’re trying to calculate the effect size in the sample; rather than estimating an effect size in the population. Finally, there is a version based on Hedges and Olkin (1985), who point out there is a small bias in the usual (pooled) estimation for Cohen’s d. Thus they introduce a small correction (method = "corrected"), by multiplying the usual value of d by (N−3)/(N−2.25).

In any case, ignoring all those variations that you could make use of if you wanted, let’s have a look at how to calculate the default version. In particular, suppose we look at the data from Dr Harpo’s class (the harpo data frame). The command that we want to use is very similar to the relevant t.test() command, but also specifies a method

cohensD( formula = grade ~ tutor,  # outcome ~ group
          data = harpo,             # data frame 
          method = "pooled"         # which version to calculate?
)

## [1] 0.7395614

This is the version of Cohen’s d that gets reported by the independentSamplesTTest() function whenever it runs a Student t-test.

Cohen’s d from a Welch test

Suppose the situation you’re in is more like the Welch test: you still have two independent samples, but you no longer believe that the corresponding populations have equal variances. When this happens, we have to redefine what we mean by the population effect size. I’ll refer to this new measure as δ′, so as to keep it distinct from the measure δ which we defined previously. What Cohen (1988) suggests is that we could define our new population effect size by averaging the two population variances. What this means is that we get:

\(\delta^{\prime}=\dfrac{\mu_{1}-\mu_{2}}{\sigma^{\prime}}\)

where

\(\sigma^{\prime}=\sqrt{\dfrac{\sigma_{1}^{2}+\sigma_{2}^{2}}{2}}\)

This seems quite reasonable, but notice that none of the measures that we’ve discussed so far are attempting to estimate this new quantity. It might just be my own ignorance of the topic, but I’m only aware of one version of Cohen’s d that actually estimates the unequal-variance effect size δ′ rather than the equal-variance effect size δ. All we do to calculate d for this version (method = "unequal") is substitute the sample means \(\ \bar{X_1}\) and \(\ \bar{X_2}\) and the corrected sample standard deviations \(\ \hat{\sigma_1}\) and \(\ \hat{\sigma_2}\) into the equation for δ′. This gives us the following equation for d,

\(d=\dfrac{\bar{X}_{1}-\bar{X}_{2}}{\sqrt{\dfrac{\hat{\sigma}_{1}\ ^{2}+\hat{\sigma}_{2}\ ^{2}}{2}}}\)

as our estimate of the effect size. There’s nothing particularly difficult about calculating this version in R, since all we have to do is change the method argument:

cohensD( formula = grade ~ tutor, 
          data = harpo,
          method = "unequal" 
 )

## [1] 0.7244995

This is the version of Cohen’s d that gets reported by the independentSamplesTTest() function whenever it runs a Welch t-test.

Cohen’s d from a paired-samples test

Finally, what should we do for a paired samples t-test? In this case, the answer depends on what it is you’re trying to do. If you want to measure your effect sizes relative to the distribution of difference scores, the measure of d that you calculate is just (method = "paired")

\(d=\dfrac{\bar{D}}{\hat{\sigma}_{D}}\)

where \(\ \hat{\sigma_D}\) is the estimate of the standard deviation of the differences. The calculation here is pretty straightforward

cohensD( x = chico$grade_test2, 
          y = chico$grade_test1,
          method = "paired" 
 )

## [1] 1.447952

This is the version of Cohen’s d that gets reported by the pairedSamplesTTest() function. The only wrinkle is figuring out whether this is the measure you want or not. To the extent that you care about the practical consequences of your research, you often want to measure the effect size relative to the original variables, not the difference scores (e.g., the 1% improvement in Dr Chico’s class is pretty small when measured against the amount of between-student variation in grades), in which case you use the same versions of Cohen’s d that you would use for a Student or Welch test. For instance, when we do that for Dr Chico’s class,

cohensD( x = chico$grade_test2, 
          y = chico$grade_test1,
          method = "pooled" 
 )

## [1] 0.2157646

what we see is that the overall effect size is quite small, when assessed on the scale of the original variables.

Search

Text Color

Text Size

Margin Size

Font Type