Skip to main content
Statistics LibreTexts

6.3: Partitioning of the Variance

  • Page ID
    56389
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    You will see the phrase “partitioning of the variance.” What that phrase means is dividing or separating the variance into parts. Understanding why variance is partitioned is important for understanding statistical significance. We always divide the variance into two main parts: error variance and true variance.

    6.3.1: True Variance

    The first source of variance is “true” variance. True variance means that the scores representing the phenomenon of interest vary because the phenomenon itself does vary. Take height. We measure height for a set of 10 people, and we have 10 different scores because the height of each person does vary. In psychology, we measure depression. If we assess depression for a set of 10 people, we have 10 different scores because the depression for each person does vary. Some people are more or less depressed than others. In both cases, the scores should vary because they reflect differences in the phenomenon. This situation is an indicator of true variance. The scores vary because the phenomenon varies. Fast forward to the next section, true variance is the opposite of error variance because if the scores vary, the scores might vary because of something else, other than the actual phenomenon varying. Hence, the opposite of true variance is error variance, which means that something else is making the scores vary.

    True variance is otherwise known as actual variance, real effects, real differences between groups, and real relationships among variables. Sometimes referred to as between-subjects variance, true signal, strong signal, or a detectable pattern. Think of true variance like a radio dial. Before the internet streaming service, in my day, we had car radios. We had to tune in past the static to find a clear signal. That clear signal is a “true” variance, indicating that you found something. Everything else is noise, static, and error variance. How much true variance we will likely obtain is discussed in later chapters.

    6.3.2: Error Variance

    In contrast with true variance, error variance means that the scores vary because of something else that is varying and not due to the phenomenon that is varying. Take height. We measure height for one person and get a score that represents how tall that person is. However, we can take another measurement of that same person and get a slightly different height. How does this happen? We might place the ruler, which is on the floor, not quite at the start. We might have the person take off their shoes. We might measure up to their head, but if they have a lot of hair, finding the top of their head might be a challenge. We might put their body against a wall to make sure they are standing up straight. These idiosyncrasies can result in the measurement of height, producing slightly different numbers.

    In psychology, we measure depression. If we assess depression for one person at a given time, we get a score that represents their depression. But if we take another measurement of their depression on a different day, the person’s mood might lift, or the White Sox win a game, and they are happy, or we just happened to catch them after getting a good night’s sleep. Or we give them a different measure of depression, and they obtain different scores because the items are different. Or we provide them with a measure of depression that was framed according to a non-Western perspective rather than a Western perspective, and their depression might not be as severe as the original measurement. Or the person has a bad attitude toward taking a depression scale and provides random answers. Or a person could misread an item and not answer as they normally would. Or two psychology interns can rate the severity of the person’s depression and have two different ratings, or scores, that represent the depression. There is a myriad of ways the scores can vary, and the variations have nothing to do with the construct of depression actually varying. Something else was responsible for varying depression scores.

    With error variance, something else can result from random effects, mistakes, systemic errors, or unknown errors. What we try to do in statistics is determine how much of the variation in scores is due to true variance and how much of the variation is due to error variance. Then, we try to determine how much of that error variance is due to pure randomness or something systematic. Trying to divide or determine how much of the error variance is due to various sources is the process of partitioning the variance.

    Random effects can refer to random chance, pure error, no pattern, or within-subjects variance. There will always be an error or “noise.” There will always be data or observations that we cannot account for because they happen due to random chance alone. In other words, people get lucky. The basketball luckily bounced into the net. The Cubs player hits a ball.

    Error variance is also referred to as an unknown error. Put differently, error is something we cannot predict. Remember that prediction always involves two variables: the independent and the dependent variable. Prediction can be expressed as: “knowing the level of the IV, we can predict the level of the DV.” By knowing how many hours you study for the test, I can predict what grade you will get on the test. But you know that not every outcome follows that prediction. Someone can study for one hour, or not at all, and get an “A” on the test. So, that observation is an error because our prediction failed to predict what grade the person should have received. So, error in this case is not just random luck; it says that we failed to make the right prediction. Unknown means something else is going on which accounts for the outcome. For a given outcome, we try to get all the variables we can to predict an outcome. We cannot account for everything, though. Sometimes, we do not know what other variables could account for the outcome; sometimes, we cannot collect the data for these variables; sometimes, we do not have a measurement method for these variables.

    Think of this problem as trying to decipher a mysterious recipe. We taste a delicious dish and then work backwards to figure out what ingredients are in that dish. We could guess most of the ingredients, but we might not guess the secret ingredients. Or we might not know what cooking technique was used to create the dish. These represent the unknown. In psychology, we observe a client in mental health stress. We know about the usual stress inputs. We may not know about other stress inputs, such as cultural stigma, past abuse history, environmental stressors, and concerns about gender or sexual orientation identities. We cannot collect data on everything, and we are limited in how we measure these variables. Anything we cannot account for is an unknown variance or error variance.

    In psychology, error variance is abundant because so many variables affect a client’s well-being and progress toward recovery. This situation is nothing to fret about because it is a normal occurrence and expected, given the state of how we measure variables related to well-being. Later chapters discuss how much error variance we are willing to accept.

    The other source of error is mistakes, which are not random. Mistakes include miskeyed data, misreporting data, or participants carelessly responding to survey questions. Some mistakes are akin to “Murphy’s Law,” where accidents happen, and something goes awry in data collection. Other mistakes are due to an error in the data collection process, such as forgetting to include an item or someone forgetting to include a test in the test battery. Mistakes are less of a statistical issue and more of a research design issue, specifically, a data collection issue. There is not much you can do about this error from a statistical standpoint. Better off reviewing the data collection to see if the errors can be removed.

    Systemic error is another source of error. Systemic error is something that is consistently going wrong, and it is usually a research design or method issue. For example, the researcher collects their own data and is biased towards seeing the results they want to see. Others are research design confounds such as the assessment practice effect. These issues are best addressed as research method issues. Here, too, there is not much you can do about this error from a statistical standpoint, and a review of the research method is warranted to address this error.

    Recall that a hallmark of statistics is that a statistical test is basically a ratio. This ratio consists of truth over error. Stated differently, we want our true variance to be larger than the error variance. As a ratio, the true variance is always on top, the numerator of a statistical ratio. The error variance is always on the bottom, the denominator of a statistical ratio. Basic math operations state that when the numerator is larger than the denominator, we get a number greater than one. For now, the lesson learned is that statistical tests, and the numbers they generate are basically ratios. The ratio consists of the true variance, or signal, or pattern, over the error variance, or noise, static or randomness. Larger numbers mean there is truer variance, or pattern or signal, and less error variance, and that result is good. How large is that number, and if it matters, are issues discussed in later chapters?


    This page titled 6.3: Partitioning of the Variance is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by Peter Ji.