9.2: What Are Effect Sizes?
- Page ID
- 50681
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Effect sizes are expressed as standard deviation units. Recall that a standard deviation establishes a metric for the distance of a score away from the mean. When an observation is one standard deviation away from the mean, we know the location of the score and how far the score is from the mean. In this case, the score is higher than 34% of observations that scored at the mean. What is good about using a standard deviation unit is that it is “standard,” no matter the range of the variable, the standard deviation unit remains the same. Recall that one standard deviation unit encompasses 68% of the scores around the mean, two standard deviation units encompass 95% of the scores around the mean, and three standard deviation units encompass 99% of the scores around the mean. These units and their corresponding percentages around the mean stay consistent no matter what the range of the variable is.
9.2.1: Different Effect Sizes
There are two types of effect sizes. Recall that there are two types of patterns we find with statistics. Note that these are patterns associated with general linear model statistics. We will discuss this concept when we turn to statistical tests. For now, we find there are two patterns. The first pattern is where group A is higher or lower than group B. We term this pattern as ‘mean differences among groups.’ For this pattern, the effect size we want to see is how far apart the mean scores are for each group. The greater the difference between the group means, or the greater the distance between the groups, the larger the effect size.
The second pattern is where increases in variable X are associated with increases or decreases in variable Y. We term this pattern “associations.” The stronger the association between the two variables, the larger the effect size is.
The following are the common effect size types. For mean differences between groups, we use the following effect sizes.
- Cohen’s D: Cohen's d is a statistical measure that compares the means of two groups by standardizing their findings onto a common scale. Cohen’s D is a population estimate of the effect size.
- Hedges G: Hedges g is a statistical measure that compares the means of two groups by standardizing their findings onto a common scale. Hedge g is a sample estimate of the effect size.
For associations between variables, we use the following effect sizes.
- Correlation coefficient
- Standardized beta coefficient
- Eta-squared describes the ratio of variance explained in the dependent variable by a predictor while controlling for other predictors
9.2.2: Calculating Effect Sizes
The basic formula for an effect size is the following: mean (group 1) – mean (group 2) / pooled standard deviation. Of note with this formula is the denominator, which is the pooled standard deviation. Dividing by the standard deviation tells you by how many units the two means are apart. It is like saying you have a lot of dough to make lots of cookies. You may have a large pile of dough, but unless you know what size cookie you will make, you do not know how many cookies you can make out of that dough. If you have a large cookie cutter, you could only get so much out of the pile of dough compared to if you have a smaller cookie cutter. The size of the cookie cutter is analogous to the standard deviation. We divide our pile of differences between the two means, then use the standard deviation as our cookie cutter to determine how many units that difference is.
With the advent of statistical programs, there is no need to calculate an effect size on your own. Most statistical programs and their associated analyses have options to click to obtain the effect size. If you see the option, click on it. If it does not have the effect size option, effect size calculations are available online.
Despite the importance of effect size, which we will discuss later, journal articles do not report the effect sizes. If there is no report of an effect size, as a rough guess of an effect size, you could subtract the difference between the two group means, take the average of the standard deviation between both groups, and do a rough division of the difference divided by the . You obtain a rough estimate of the effect size. Of note, if we really need to read about effect sizes, why then do journal articles do not report them? Well, if the authors of the article do not report it, or the journal editors do not request the authors report the effect sizes, then they are idiots. APA and statistical organizations all but require effect sizes to be reported. Oh well.
9.2.3: The Range of Effect Sizes
Effect sizes are expressed as standard deviation units. The word “standard” means that everyone knows what these sizes mean.
The range of effect sizes is usually 0 to ~1.5 or slightly above. One rough way to read this range is that 0 means 0 standard deviation units, and 1.5 means 1 ½ standard deviation units of difference.
The general guidelines for reading an effect size are as follows: Small effect size = .3, Medium = .5, Large = .8. I like to think of effect size as a proportion of the standard deviation. I use these guides: .3 would be 1/3 of a standard deviation, .5 is ½ of a standard deviation, .8 is over ¾ of a standard deviation.
I like to think of the effect size as a measurement of the size of an ingredient. A small effect size of .3 is like putting 1/3 of a cup of sugar in your coffee. A medium effect size of .5 is like putting ½ of a cup of sugar in your coffee. A large effect size of .8 is like putting more than ¾ of a cup of sugar in your coffee. And I am sure you know people like this, probably even someone like you.
I like to think of effect sizes as how transformative the effect is. Small sample sizes produce effects, but they are hard to notice. This metaphor is a bit of a stretch, but let’s try it. Putting 1/3 of a cup of sugar in your coffee will have a small effect on you, or you will get through the day, but no one will notice any visible change in you. Putting 1/2 of a cup of sugar in your coffee will have a moderate effect on you, or you will get through the day with an extra spring in your step, and quite likely, a few people around you will notice the change in you. Putting 3/4 of a cup of sugar in your coffee will have a large effect on you, or you will get through the day and probably through the night, and everyone around you will notice the change in you.
9.2.4: What Effect Sizes Do You Want to See?
Small, medium, and large effect sizes are simply descriptive. Based on the above examples, it would seem like large effect sizes are valued because we want to see notable impacts of our studies. But evaluating effect sizes is based on the context of the study and within the field discipline. There is no hard and fast rule indicating that large effect sizes are good and small effect sizes are not worth noting.
Psychology as a social science is quite an open system. So many variables are out there that can influence something. There are so many confounding, alternative variables that could adversely affect your conclusions. We do struggle with how to obtain precise measurements of our variables. Many variables are social constructs. In psychology, the concept of personality is a social construct. We all know it is there, but it is a latent construct; we have to infer that there is such a thing as personality. Racism is a construct. We all know it is there, but we have different ways of conceptualizing how racism varies, so it is hard to create one measurement of racism. Even variables that we think are easy to measure are hard to measure. Counting the number of alcohol drinks a person consumes is difficult because of the multidimensionality of alcohol. One glass of beer, one glass of wine, and one shot of whiskey could all count as one drink, but the alcohol content differs, so it is hard to state that they all count as one drink. To complicate matters, having one drink per night to relax after a long day, versus one drink per hour while you are at a watch party for your favorite sports team, would probably lead to different outcomes. In this case, drinking alcohol is compounded by two other variables, time frame and context. It is hard to say that we have good measurements of variables. These factors, and others, make it difficult to find large effect sizes. It is difficult to find one or a few variables that have a notable impact on the outcome.
Any research discipline would covet large effect sizes because a large effect size implies a large, notable impact. It is akin to finding the “magic cure” or the panacea. In psychology, we certainly want to find those variables that have a large impact on our outcomes. Nothing wrong with that desire. I would offer the field of psychology, and its theories and stages of change, would propose that there are no “magic pills” or a best treatment that would cure someone’s mental health struggles. Fast, large effects wear off quite quickly. It would be akin to a fad diet. You might lose weight very quickly, but our bodies, or physical capacity, just do not adapt well to sudden, rapid shocks to our system. Large effect sizes might show immediate effects, and obviously, we must account for context, but large effect sizes might not be all that promising.
Ostensibly, a low effect size means that the effect of the independent variable on the dependent variable is weak. The weaker the effect, the less prominent the effect of that variable is on the outcome. In psychology, because there are so many variables that impact an outcome, and because of imprecise measurements of our variables, research designs in psychology cannot possibly include every variable in their study; low effect sizes are bound to occur. Getting an effect size of .3, a small effect size, is good, although there is no industry standard or consensus indicating a small effect size is good enough.
In psychology, low effect sizes might be and should be desired. The goal of psychology is to improve mental health. Change does not happen overnight. Changes happen slowly, and for good reasons. It takes time for our bodies to adapt to change, even for the better. In psychology, people resist change, even if it is for the better. It can take a long time for a change to take hold. Changes are not always noticeable at first. And we relapse. Some days we can handle everything, some days we cannot. In this context, a low effect size might reflect the realities or the difficulties of how people change. It is a slow process, but slow is probably better as we adapt to change.
Low effect sizes might align better with treatment planning. Small behavioral changes are victories. Sometimes, just getting out of bed is a victory. Sometimes, just avoiding an alcoholic drink for one night of the week is a victory. Sometimes, just reading a book for 30 minutes a week is a victory. These changes might not seem big, but they are big to us. Small changes, small effect sizes, might prove to be worthwhile over the long run for the lasting change we want to see as we recover from our mental health struggles. As they say in exercise, “progress, not perfection.”
Are medium effect sizes the best of both worlds? This author's position is to be agonistic. As always, default to your conceptualization of the issue and the variables under consideration and use a literature review to catalog similar effect sizes, as all of your conceptualization is about the value of a medium effect size for your study and its outcomes.
9.2.5: Effect Sizes and Other Considerations
Effect sizes are not directly impacted by other statistical considerations, such as the sample size or the p value. You might think they are related, but conceptually, the effect sizes are independent of other statistical considerations.
Effect size and sample size. A low sample size says nothing about the resulting effect size. Low sample sizes tend to produce unstable results, so it follows that the effect sizes would be unstable. Effect sizes do not decrease as the sample sizes increase. You might think that more sampling means more variation, which means more possible errors, but that is not the case. Do effect sizes increase if the sample size increases? That is not the case, either.
The only thing increasing the sample size will do is increase the chance of detecting an effect. Increasing the sample size is not solely responsible for a corresponding increase or decrease in the effect size itself. This issue is basically a power analysis. More on that later.
Effect sizes and the p value. P values that indicate non-significance are associated with a low, almost zero effect size. If there is no significant pattern in the variables, it is not significant, and there are no effect sizes to detect. If there is a significant p value, a lower p value does not mean it is even more significant, and the effect size does not correspondingly increase. Lower p values have no relationship to the effect sizes. We already established there is no such thing as “highly significant at p < .001,” consequently, there is no such thing as a high effect size when the p value is .001, vs. a moderate effect size when the p value is .01.


