Skip to main content
Statistics LibreTexts

11.8: Reading T-tests from Journal Articles

  • Page ID
    57585
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Let us refer to this example from a study done by Conrad, S., Queenan, R., Brown, L., & Tolou-Shams, M. (2017). The research question for this study was: Are there gender differences in psychiatric and risk behaviors for non-incarcerated youth? The research question was based on the disturbing trend of more young girls entering the juvenile system. The researchers base this trend on increased ease of access to marijuana and (unfortunately) sexual-risk behaviors. The researchers wanted to know if the young girls entering the system were similar to or different from boys, who, unfortunately, are more likely than girls to be in juvenile systems. The researchers wanted to know if the girls were similar to boys in terms of their risk behaviors, or if girls were engaging in risk behaviors specific to their gender.

    Their hypothesis was: are girls similar, lower, or higher levels of at-risk behavior or psychiatric symptoms than boys?

    Table 1 presents their t-test findings. Try to look for t-tests displayed in table format whenever possible. They are much easier to read in table form. Note in Table 1 that the t-tests are combined with chi-square tests. This combination is typical. T-tests only examine two groups, and the dependent variable is continuous. The chi-square tests examine groups, and the dependent variable is categorical. To be economical in presentation, researchers combined the results of the t-test and chi-square tests.

    Table 1: Descriptive Statistics: Mean, Standard Deviation, Percentage, Chi-Square, and T-test By Gender

    Total (N=60 )
    Mean (SD) or % ( N)

    Male (N=42)
    Mean (SD) or % (N)

    Female (N=18)
    Mean (SD) or % ( )

    t or \(\chi^2\)

    p-value

    Demographics

    Age

    15.54 (1.30)

    15.62 (1.34)

    15.40 (1.24)

    . 53

    . 60

    Race (Caucasian)

    73% (44)

    69% (29)

    80% (13)

    . 61

    . 34

    Ethnicity (non-Latino)

    79% (47)

    83% (34)

    80% (13)

    . 56

    . 75

    Mental Health

    YSR Total Symptoms (t -score)

    55.69 (10.11)

    53.95 (10.39)

    60.29 (7.93)

    -2.06

    .04*

    YSR Internalizing Symptoms (t -score)

    50.29 (8.82)

    48.70 (8.92)

    54.50 (8.65)

    -2.14

    .03*

    YSR Externalizing Symptoms (t -score)

    60.69 (10.88)

    59.62 (11.73)

    63.50 (7.80)

    -1.14

    . 26

    Childhood Trauma Questionnaire

    Total Trauma Score

    48.93 (9.44)

    43.70 (9.35)

    49.13 (9.53)

    -1.83

    .05*

    Sexual Abuse

    6.12 (2.58)

    5 (0)

    7.9 (5.59)

    -3.28

    .01*

    Emotional Abuse

    8.35 (3.51)

    7.71 (3.51)

    8.80 (3.51)

    -1.08

    . 31

    Physical Abuse

    7.33 (2.69)

    6.13 (3.60)

    6.26 (1.79)

    . 19

    . 85

    Adolescent Risk Behavior

    Substance Use

    Ever Use Marijuana (yes)

    88% (53)

    87% (36)

    93% (17)

    . 13

    . 75

    Used Marijuana Past 30 Days (yes)

    51% (31)

    54% (22)

    53% (9)

    . 01

    . 87

    Ever Smoked Cigarettes (yes)

    70% (42)

    67% (28)

    86% (14)

    2.16

    .05*

    Cigarettes Past 30 Days

    59% (35)

    54% (22)

    80% (13)

    1.64

    . 12

    Ever Used Alcohol (yes)

    75% (46)

    75% (31)

    86% (15)

    . 95

    . 27

    Used Alcohol Past 30 Days (yes)

    31% (19)

    26% (10)

    53% (9)

    . 19

    . 61

    Ever Used Club Drugs

    12% (7)

    10% (4)

    20% (3)

    1.34

    . 08

    Ever Used Cocaine

    3% (2)

    3% (1)

    6% (1)

    . 59

    . 46

    Ever Used Heroin

    0% (0)

    0% (0)

    0% (0)

    -

    -

    Every Used OTC/Rx Medications

    20% (12)

    16% (6)

    33% (6)

    2.61

    .04*

    Sexual Behavior

    Lifetime Sexual Behavior (yes)

    71% (43)

    69% (27)

    90% (16)

    5.85

    .01**

    Condom Use Last Sex (yes)

    50% (22)

    54% (14)

    46% (8)

    2.50

    .05*

    Age 1st Sexual Intercourse

    14.40 (.95)

    14.60 (.88)

    13.30 (1.03)

    2.97

    .01**

    # of Partners Past 90 Days

    1.79 (1.35)

    2.35 (2.11)

    1 (0.60)

    2.14

    .04*

    Been/Gotten Pregnant (yes)

    7% (4)

    5% (2)

    13% (2)

    . 64

    . 47

    Substance Use and Sexual Behavior

    Any Drug Use Last Sex

    36% (15)

    34% (9)

    40% (6)

    . 01

    . 68

    Partner Used Any Drug Last Sex

    36% (16)

    31% (8)

    50% (8)

    1.80

    . 08

    Alcohol Used Last Sex

    5% (3)

    5% (2)

    6% (1)

    . 01

    . 70

    Partner Used Alcohol Last Sex

    7% (4)

    5% (2)

    13% (2)

    . 62

    . 39

    Marijuana Used Last Sex

    14% (6)

    15% (4)

    13% (2)

    . 25

    . 47

    Partner Used Marijuana Last Sex

    16% (8)

    12% (3)

    28% (5)

    1.75

    * p < .05. ** p < .01.

    To read statistics tables, find your starting point. In this case, as with most tables, start in the upper left-hand corner. Identify your independent variables and dependent variables. The first dependent variable is age; the independent variable is gender, and the two levels are male and female, in the table column headers. Look for the means. For age, the mean age for males is 15.62, and for females is 15.40. Then find your t-value, which is to the right. It is .53. Then find the accompanying p value, which is .60. So, the t-value is not significant, indicating the males and females are similar in age. This finding is expected. We are examining youth, and we are unlikely to find a difference in age. Does this finding matter? It is not relevant to the research question, but it does demonstrate that males and females are equal in age. As a possible confound, this finding is good because if age is the same in both groups, then we can be certain that differences between the males and females are due to gender differences rather than an age difference. In this case, we want to find non-significant results to establish equivalency between the two groups, in this case, boys and girls.

    Let us examine the t-test for the mental health variables. The YSR Total Symptoms variable has a mean of 53.95 for boys and 60.29 for females. The accompanying t-value to the right is -2.06, and its p value is .04. The negative t-value simply means the boys have a lower mean score than the girls. The t-test is significant, and the pattern is that girls have higher total mental health symptoms than boys. Of note, yes, the t-test is negative, so technically the boys have a lower total score than girls, but it is just intuitively easier to say that girls have higher total mental health symptoms than boys. This finding is expected according to the researchers’ hypotheses, which is that girls might have more mental health symptoms than boys.

    To interpret this finding, we need to know what these scores mean. Although their method section does not state the scoring protocol, the table does indicate that the scores are t-scores. Recall that t-scores have a mean of 50 and an SD of 11. The mean for boys is 53.95, which is right above the mean; the girls have a mean of 60.29, which means they are a standard deviation above the mean. By itself, it seems boys have average amounts of symptoms, while girls have a level of symptoms that might border on the sub-clinical range. A good idea would be to review material about the YSR to understand these t-scores, but for now, our guess is as good as any.

    Let us examine the effect size by examining the standard deviations for both boys and girls. Recall that effect sizes are based on standard deviations. Do this rough estimate of an effect size because no effect size is reported in the table. The mean difference between boys and girls is about 6 (60.29 – 53.95). The standard deviation for boys is 11.39, while for girls it is 7.93. A rough pooled standard deviation is about nine ((11.39 + 7.93) / 2 = 9.43). A mean difference of 6 divided by 9 is roughly .67 or 2/3 of a standard deviation. This value is a high effect size according to our rough effect size value guidelines (.3 = small, .5 = medium, .7 = high effect sizes). This difference between boys and girls is basically a high effect size, which means this effect is noticeable.

    Putting it all together, it does seem that girls have higher mental health issues than boys in the context of pre-juvenile delinquency populations. This finding is conceptually expected because mental health issues among girls are (unfortunately) an ongoing trend. It makes sense that girls with mental health problems are probably at risk of experiencing the juvenile delinquency system.

    Remember, we want to find patterns among the t-test results. We examine the accompanying t-tests for the other two YSR Mental Health Scales. In brief, the t-test for the Internalizing Symptoms scale is -2.14, p = .03, and significant; the pattern is that girls have higher Internalizing Symptoms than boys. The t-test for the Externalizing Symptoms scale is -1.14, p = .26, and not significant, and the pattern is that girls and boys have similar levels of Externalizing Symptoms.

    The t-test result for the Internalizing Symptoms scale is expected because girls tend to have internalizing symptoms, such as depression or anxiety. The overall high YSR score for the girls could be the result of girls having a higher level of depression or anxiety than most girls, and boys. This result makes sense. Interestingly, the Externalizing Symptoms scale did not have a significant result for girls and boys. The girls have almost the same level of high externalizing symptoms as the boys. Both boys and girls have scores of 59.62 and 63.50, respectively. The t-test is not significant, but this seems to be an interesting finding. Generally, girls have lower levels of externalizing behaviors, such as aggressiveness, compared to boys. But if girls have similar levels of externalizing behaviors as boys, perhaps this combination of high internalizing symptoms, plus the high levels of externalizing behaviors at similar levels as boys, is a bad combination for girls to end up in the juvenile system. In concert with each other, the t-test results of the internalizing symptoms and the externalizing symptoms say something about the profile of girls who are at risk of entering the juvenile delinquency system. The review of these t-test results together is the process of using statistics to discover something new about an issue.

    Are there any concerns about these statistics? Well, the sample size does seem like it should be noted. The overall sample size is 60, and the males have 42, but the female sample size looks low at 18. The concern is the robustness of the results. Is it possible that the results could change with more females? It is always possible. Let us use a conceptual approach. Finding and recruiting girls with potentially juvenile delinquent behavior is not easy. The method section states that the population is court-involved, non-incarcerated boys and girls. Generally, more boys than girls are in this population. Remember that youth need parental consent to participate in studies, so obtaining parental consent is a difficult process. Then, the youth must agree to participate and complete all the measures. Those requests are a lot for the youth in this context. In some ways, the researchers should be commended for recruiting 18 girls from this context. Should they recruit more? Of course, the answer is yes. Can they? Well, recruitment within a court system, which is not easy to get permission to do, takes a lot of resources and time. You have to ask yourself, could I have recruited that many participants with my own resources and time? Quite likely, the answer is no, so you have to back off the researchers and not impose a “you should have….” statement on the researchers.

    The sample size is small. Small sample size means that a study is underpowered, and underpowered studies may be an issue for a Type II error. Recall that power is the ability to detect an effect if, in fact, the effect does exist. One aspect of an underpowered study is a low sample size. Low sample sizes do not have enough opportunity to detect the effect you want to see. A Type II error is saying a result is not significant when, in fact, it is significant. Is this scenario occurring in this study?

    Well, the t-test result for the YSR Externalizing Symptoms is -1.14, p = .26. Reporting the p value here is helpful. The result is clearly above the .05 threshold, so it is doubtful that shifts in the p value to below .05 will occur. The difference between the two groups is 4, boys = 59.62, girls = 63.50, and this difference seems small, and the actual scores just hover around the 60 t-test score, which means both groups are at the same level of externalizing symptoms pathology. The t-test result says that boys and girls are not different in Externalizing Symptoms. Although the sample size is low, suggesting an underpowered study and a possible Type II error, several factors suggest otherwise. The actual p value, the small difference between the groups, and the similarity in their t-test scores point in the same direction. Conceptually, the non-significant result also makes sense. Therefore, it is possible that a Type II error did not occur, and the sample size has minimal impact on this result.

    How about the YSR Total Symptoms score? The t-test is -2.06, and the p value is .04, which is right below the .05 threshold. This situation is a possible Type I error. The Type I error is saying you have a significant result when, in fact, you do not have a significant result. Given the low sample size for the girls, is it possible that with a larger sample size, the mean score might shift, specifically shift lower, resulting in a smaller mean difference, rendering the t-test result as not significant?

    That is quite the scenario, and unlikely. Recall the population. These are girls in court but not incarcerated. They likely experience mental health symptoms, and it is equally unlikely that they would not be experiencing some level of depression, anxiety or other internalizing symptoms. While the statistical situation might be ripe for a Type I error, there is likely no Type I error. Remember, one statistical test does not make or break a study and recall that our approach is not to criticize or look for anomalies, exceptions, or problems. In concert with the rest of the t-test results, it does appear that the difference between girls and boys on the YSR Total Symptom variable is a valid result. Yes, another study replicating these results is desirable. Until then, we keep our original conclusions and explanations.

    And that is how we read statistics, in this case, t-test results, in a journal article.


    This page titled 11.8: Reading T-tests from Journal Articles is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by Peter Ji.