## Two sample tests, effect sizes

Answer to the sign test question. It is enough to write:

Code $$\PageIndex{1}$$ (Python):

aa <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
bb <- c(5, 5, 5, 5, 5, 5, 5, 5, 5)
dif <- aa - bb
pos.dif <- dif[dif > 0]
prop.test(length(pos.dif), length(dif))


Here the sign test failed to find obvious differences because (like t-test and Wilcoxon test) it considers only central values.

Answer to the ozone question. To know if our data are normally distributed, we can apply the Normality() function:

Code $$\PageIndex{2}$$ (Python):

Normality <- function(a) {ifelse(shapiro.test(a)$p.value < 0.05, "NOT NORMAL", "NORMAL")} ozone.month <- airquality[, c("Ozone","Month")] ozone.month.list <- unstack(ozone.month) sapply(ozone.month.list, Normality) # asmisc.r  (Here we applied unstack() function which segregated our data by months.) Answer to the argon question. First, we need to check assumptions: Code $$\PageIndex{3}$$ (Python): ar <- read.table("data/argon.txt") Normality <- function(a) {ifelse(shapiro.test(a)$p.value < 0.05, "NOT NORMAL", "NORMAL")}
sapply(unstack(ar, form=V2 ~ V1), Normality)


It is clear that in this case, nonparametric test will work better:

Code $$\PageIndex{4}$$ (Python):

ar <- read.table("data/argon.txt")
wilcox.test(jitter(V2) ~ V1, data=ar)


(We used jitter() to break ties. However, be careful and try to check if this random noise does not influence the p-value. Here, it does not.)

And yes, boxplots (Figure 5.2.4) told the truth: there is a statistical difference between two set of numbers.

Answer to the cashiers question. Check normality first:

Code $$\PageIndex{5}$$ (Python):

Normality <- function(a) {ifelse(shapiro.test(a)$p.value < 0.05, "NOT NORMAL", "NORMAL")} cashiers <- read.table("data/cashiers.txt", h=TRUE) head(cashiers) sapply(cashiers, Normality) # asmisc.r  Now, we can compare means: Code $$\PageIndex{6}$$ (Python): cashiers <- read.table("data/cashiers.txt", h=TRUE) (cashiers.m <- sapply(cashiers, mean))  It is likely that first cashier has generally bigger lines: Code $$\PageIndex{7}$$ (Python): cashiers <- read.table("data/cashiers.txt", h=TRUE) with(cashiers, t.test(CASHIER.1, CASHIER.2, alt="greater"))  The difference is not significant. Answer to the grades question. First, check the normality: Code $$\PageIndex{8}$$ (Python): grades <- read.table("data/grades.txt") classes <- split(grades$V1, grades$V2) Normality <- function(a) {ifelse(shapiro.test(a)$p.value < 0.05, "NOT NORMAL", "NORMAL")}
sapply(classes, Normality) # asmisc.r


(Function split() created three new variables in accordance with the grouping factor; it is similar to unstack() from previous answer but can accept groups of unequal size.)

Check data (it is also possible to plot boxplots):

Code $$\PageIndex{9}$$ (Python):

grades <- read.table("data/grades.txt")
classes <- split(grades$V1, grades$V2)
sapply(classes, median, na.rm=TRUE)


It is likely that the first class has results similar between exams but in the first exam, the second group might have better grades. Since data is not normal, we will use nonparametric methods:

Code $$\PageIndex{10}$$ (Python):

grades <- read.table("data/grades.txt")
classes <- split(grades$V1, grades$V2)
wilcox.test(classes$A1, classes$A2, paired=TRUE, conf.int=TRUE)
wilcox.test(classes$B1, classes$A1, alt="greater", conf.int=TRUE)


For the first class, we applied the paired test since grades in first and second exams belong to the same people. To see if differences between different classes exist, we used one-sided alternative hypothesis because we needed to understand not if the second class is different, but if it is better.

As a result, grades of the first class are not significantly different between exams, but the second class performed significantly better than first. First confidence interval includes zero (as it should be in the case of no difference), and second is not of much use.

Now effect sizes with suitable nonparametric Cliff’s Delta:

Code $$\PageIndex{11}$$ (Python):

grades <- read.table("data/grades.txt")
classes <- split(grades$V1, grades$V2)
cliff.delta(classes$A1, classes$A2)
cliff.delta(classes$B1, classes$A1)


Therefore, results of the second class are only slightly better which could even be negligible since confidence interval includes 0.

Answer to the question about ground elder leaves (Figure $$\PageIndex{1}$$).

First, check data, load it and check the object:

Code $$\PageIndex{12}$$ (Python):

aa <- read.table("http://ashipunov.info/shipunov/open/aegopodium.txt", h=TRUE)
aa$SUN <- factor(aa$SUN, labels=c("shade","sun"))
str(aa)


(We also converted SUN variable into factor and supplied the proper labels.)

Let us check the data for the normality and for the most different character (Figure $$\PageIndex{2}$$):

Code $$\PageIndex{13}$$ (Python):

aa <- read.table("http://ashipunov.info/shipunov/open/aegopodium.txt", h=TRUE)
Normality <- function(a) {ifelse(shapiro.test(a)$p.value < 0.05, "NOT NORMAL", "NORMAL")} aggregate(aa[,-5], list(light=aa[,5]), Normality) # asmisc.r Linechart(aa[, 1:4], aa[, 5], xmarks=FALSE, lcolor=1, se.lwd=2, mad=TRUE) # asmisc.r  TERM.L (length of the terminal leaflet, it is the rightmost one on Figure $$\PageIndex{1}$$), is likely most different between sun and shade. Since this character is normal, we will run more precise parametric test: Code $$\PageIndex{14}$$ (Python): aa <- read.table("http://ashipunov.info/shipunov/open/aegopodium.txt", h=TRUE) t.test(LEAF.L ~ SUN, data=aa)  To report t-test result, one needs to provide degrees of freedom, statistic and p-value, e.g., like “in a Welch test, t statistic is 14.85 on 63.69 degrees of freedom, p-value is close to zero, thus we rejected the null hypothesis”. Effect sizes are usually concerted with p-values but provide additional useful information about the magnitude of differences: Code $$\PageIndex{15}$$ (Python): aa <- read.table("http://ashipunov.info/shipunov/open/aegopodium.txt", h=TRUE) library(effsize) cohen.d(LEAF.L ~ SUN, data=aa) summary(K(LEAF.L ~ SUN, data=aa))  Both Cohen’s d and Lyubishchev’s K (coefficient of divergence) are large. ## ANOVA Answer to the height and color questions. Yes on both questions: Code $$\PageIndex{16}$$ (Python): hwc <- read.table("data/hwc.txt", h=TRUE) summary(aov(HEIGHT ~ COLOR, data=hwc)) pairwise.t.test(hwc$HEIGHT, hwc$COLOR)  There are significant differences between all three groups. Answer to the question about differences between cow-wheats (Figure $$\PageIndex{3}$$) from seven locations. Load the data and check its structure: Code $$\PageIndex{17}$$ (Python): mm <- read.table("http://ashipunov.info/shipunov/open/melampyrum.txt", h=TRUE) str(mm)  Plot it first (Figure $$\PageIndex{4}$$): Code $$\PageIndex{18}$$ (Python): mm <- read.table("http://ashipunov.info/shipunov/open/melampyrum.txt", h=TRUE) old.par <- par(mfrow=c(2, 1), mai=c(0.5, 0.5, 0.1, 0.1)) boxplot(P.HEIGHT ~ LOC, data=mm, col=grey(0.8)) boxplot(LEAF.L ~ LOC, data=mm, col=rgb(173, 204, 90, max=255)) par(old.par)  Check assumptions: Code $$\PageIndex{19}$$ (Python): mm <- read.table("http://ashipunov.info/shipunov/open/melampyrum.txt", h=TRUE) Normality <- function(a) {ifelse(shapiro.test(a)$p.value < 0.05, "NOT NORMAL", "NORMAL")}
sapply(mm[, c(2, 5)], Normality)
bartlett.test(P.HEIGHT ~ LOC, data=mm)


Consequently, leaf length must be analyzed with non-parametric procedure, and plant height—with parametric which does not assume homogeneity of variance (one-way test):

Code $$\PageIndex{20}$$ (Python):

mm <- read.table("http://ashipunov.info/shipunov/open/melampyrum.txt", h=TRUE)
oneway.test(P.HEIGHT ~ LOC, data=mm)
pairwise.t.test(mm$P.HEIGHT, mm$LOC)


Now the leaf length:

Code $$\PageIndex{21}$$ (Python):

mm <- read.table("http://ashipunov.info/shipunov/open/melampyrum.txt", h=TRUE)
kruskal.test(LEAF.L ~ LOC, data=mm)
pairwise.wilcox.test(mm$LEAF.L, mm$LOC)


All in all, location pairs 2–4 and 4–6 are divergent statistically in both cases. This is visible also on boxplots (Figure $$\PageIndex{5}$$). There are more significant differences in plant heights, location #6, in particular, is quite outstanding.

## Contingency tables

Code $$\PageIndex{22}$$ (Python):

pr <- read.table("data/seedlings.txt", h=TRUE)
str(pr)


Now, what we need is to examine the table because both variables only look like numbers; in fact, they are categorical. Dotchart (Figure $$\PageIndex{5}$$) is a good way to explore 2-dimensional table:

Code $$\PageIndex{23}$$ (Python):

pr <- read.table("data/seedlings.txt", h=TRUE)
(pr.t <- table(pr))
dotchart(t(pr.t), pch=19, gcolor=2)


To explore possible associations visually, we employ vcd package:

Code $$\PageIndex{24}$$ (Python):

pr <- read.table("data/seedlings.txt", h=TRUE)
pr.t <- table(pr)
library(vcd)


Both table output and vcd association plot (Figure $$\PageIndex{6}$$) suggest some asymmetry (especially for CID80) which is a sign of possible association. Let us check it numerically, with the chi-squared test:

Code $$\PageIndex{25}$$ (Python):

pr <- read.table("data/seedlings.txt", h=TRUE)
pr.t <- table(pr)
chisq.test(pr.t, simulate=TRUE)


Yes, there is an association between fungus (or their absence) and germination. How to know differences between particular samples? Here we need a post hoc test:

Code $$\PageIndex{26}$$ (Python):

pr <- read.table("data/seedlings.txt", h=TRUE)
pairwise.Table2.test(table(pr), exact=TRUE)


(Exact Fisher test was used because some counts were really small.)

It is now clear that germination patterns form two fungal infections, CID80 and CID105, are significantly different from germination in the control (CID0). Also, significant association was found in the every comparison between three infections; this means that all three germination patterns are statistically different. Finally, one fungus, CID63 produces germination pattern which is not statistically different from the control.

Answer to the question about multiple comparisons of toxicity. Here we will go the slightly different way. Instead of using array, we will extract p-values right from the original data, and will avoid warnings with the exact test:

Code $$\PageIndex{27}$$ (Python):

tox <- read.table("data/poisoning.txt", h=TRUE)
tox.p.values <- apply(tox[,-1], 2, function(.x) fisher.test(table(tox[, 1], .x))$p.value)  (We cannot use pairwise.Table2.test() from the previous answer since our comparisons have different structure. But we used exact test to avoid warnings related with small numbers of counts.) Now we can adjust p-values: Code $$\PageIndex{28}$$ (Python): tox <- read.table("data/poisoning.txt", h=TRUE) tox.p.values <- apply(tox[,-1], 2, function(.x) fisher.test(table(tox[, 1], .x))$p.value)


Well, now we can say that Caesar salad and tomatoes are statistically supported as culprits. But why table tests always show us two factors? This could be due to the interaction: in simple words, it means that people who took the salad, frequently took tomatoes with it.

Answer to the scurvy-grass question. Check the data file, load and check result:

Code $$\PageIndex{29}$$ (Python):

cc <-read.table("http://ashipunov.info/shipunov/open/cochlearia.txt", h=TRUE)
cc$LOC <- factor(cc$LOC, labels=paste0("loc", levels(cc$LOC))) cc$IS.CREEPING <- factor(cc$IS.CREEPING, labels=c("upright", "creeping")) str(cc)  (In addition, we converted LOC and IS.CREEPING to factors and provided new level labels.) Next step is the visual analysis (Figure $$\PageIndex{7}$$): Code $$\PageIndex{30}$$ (Python): cc <-read.table("http://ashipunov.info/shipunov/open/cochlearia.txt", h=TRUE) s.cols <- colorRampPalette(c("white", "forestgreen"))(5)[3:4] spineplot(IS.CREEPING ~ LOC, data=cc, col=s.cols)  Some locations look different. To analyze, we need contingency table: Code $$\PageIndex{31}$$ (Python): cc <-read.table("http://ashipunov.info/shipunov/open/cochlearia.txt", h=TRUE) (cc.lc <- xtabs(~ LOC + IS.CREEPING, data=cc))  Now the test and effect size: Code $$\PageIndex{32}$$ (Python): cc <-read.table("http://ashipunov.info/shipunov/open/cochlearia.txt", h=TRUE) cc.lc <- xtabs(~ LOC + IS.CREEPING, data=cc) chisq.test(cc.lc, simulate.p.value=TRUE) VTcoeffs(cc.lc)[2, ] # asmisc.r  (Run pairwise.Table2.test(cc.lc) yourself to understand differences in details.) Yes, there is a large, statistically significant association between locality and life form of scurvy-grass. Answer to the question about equality of proportions of LOBES character in two birch localities. First, we need to select these two localities (1 and 2) and count proportions there. The shortest way is to use the table() function: Code $$\PageIndex{33}$$ (Python): (betula.ll <- table(betula[betula$LOC < 3, c("LOC","LOBES")]))


Spine plot (Figure $$\PageIndex{8}$$) helps to make differences in the table even more apparent:

Code $$\PageIndex{34}$$ (Python):

betula <- read.table("http://ashipunov.info/shipunov/open/betula.txt", h=TRUE)
betula.ll <- table(betula[betula$LOC < 3, c("LOC","LOBES")]) birch.cols <- colorRampPalette(c("black", "forestgreen"))(5)[3:4] spineplot(betula.ll, col=birch.cols)  (Please also note how to create two colors intermediate between black and dark green.) The most natural choice is prop.test() which is applicable directly to the table() output: Code $$\PageIndex{35}$$ (Python): betula <- read.table("http://ashipunov.info/shipunov/open/betula.txt", h=TRUE) betula.ll <- table(betula[betula$LOC < 3, c("LOC","LOBES")])
prop.test(betula.ll)


Instead of proportion test, we can use Fisher exact:

Code $$\PageIndex{36}$$ (Python):

betula <- read.table("http://ashipunov.info/shipunov/open/betula.txt", h=TRUE)
betula.ll <- table(betula[betula$LOC < 3, c("LOC","LOBES")]) fisher.test(betula.ll)  ... or chi-squared with simulation (note that one cell has only 4 cases), or with default Yates’ correction: Code $$\PageIndex{37}$$ (Python): betula <- read.table("http://ashipunov.info/shipunov/open/betula.txt", h=TRUE) betula.ll <- table(betula[betula$LOC < 3, c("LOC","LOBES")])
chisq.test(betula.ll)


All in all, yes, proportions of plants with different position of lobes are different between location 1 and 2.

And what about effect size of this association?

Code $$\PageIndex{38}$$ (Python):

betula <- read.table("http://ashipunov.info/shipunov/open/betula.txt", h=TRUE)
betula.ll <- table(betula[betula\$LOC < 3, c("LOC","LOBES")])
VTcoeffs(betula.ll)[2, ] # asmisc.r


Answer to the question about proportion equality in the whole betula dataset. First, make table:

Code $$\PageIndex{39}$$ (Python):

betula <- read.table("http://ashipunov.info/shipunov/open/betula.txt", h=TRUE)
(betula.lw <- table(betula[, c("LOBES","WINGS")]))


There is no apparent asymmetry. Since betula.lw is $$2\times2$$ table, we can apply fourfold plot. It shows differences not only as different sizes of sectors, but also allows to check 95% confidence interval with marginal rings (Figure $$\PageIndex{9}$$):

Code $$\PageIndex{40}$$ (Python):

betula <- read.table("http://ashipunov.info/shipunov/open/betula.txt", h=TRUE)
betula.lw <- table(betula[, c("LOBES","WINGS")])
fourfoldplot(betula.lw, col=birch.cols)


Also not suggestive... Finally, we need to test the association, if any. Noe that samples are related. This is because LOBES and WINGS were measured on the same plants. Therefore, instead of the chi-squared or proportion test we should run McNemar’s test:

Code $$\PageIndex{41}$$ (Python):

betula <- read.table("http://ashipunov.info/shipunov/open/betula.txt", h=TRUE)
betula.lw <- table(betula[, c("LOBES","WINGS")])
mcnemar.test(betula.lw)


We conclude that proportions of two character states in each of characters are not statistically different.