Skip to main content
Statistics LibreTexts

C.8: Answers to exercises

  • Page ID
    4327
  • Answer to the sundew wetness question. Let us apply the approach we used for the leaf shape:

    Code \(\PageIndex{1}\) (Python):

    leaf <- read.table("data/sundew.txt", h=TRUE, sep=";")
    wet <- ts(leaf$WET, frequency=36)
    str(wet)
    acf(wet)
    plot(stl(wet, s.window="periodic")$time.series)
    

    (Plots are not shown, please make them yourself.)

    There is some periodicity with 0.2 (5 hours) period. However, trend is likely absent.

    Answer to the dodder infestation question. Inspect the data, load and check:

    Code \(\PageIndex{2}\) (Python):

    cu <- read.table("http://ashipunov.info/shipunov/open/cuscuta.txt",h=TRUE, sep="	")
    str(cu)
    
    Screen Shot 2019-02-01 at 10.29.29 PM.png
    Figure \(\PageIndex{1}\) The example of Sweave() report.

    (Note that two last columns are ranked. Consequently, only nonparametric methods are applicable here.)

    Then we need to select two hosts of question and drop unused levels:

    Code \(\PageIndex{3}\) (Python):

    cu <- read.table("http://ashipunov.info/shipunov/open/cuscuta.txt",h=TRUE, sep="	")
    cu2 <- cu[cu$HOST %in% c("Alchemilla","Knautia"), ]
    cu2 <- droplevels(cu2)
    

    It is better to convert this to the short form:

    Code \(\PageIndex{4}\) (Python):

    cu <- read.table("http://ashipunov.info/shipunov/open/cuscuta.txt",h=TRUE, sep="	")
    cu2 <- cu[cu$HOST %in% c("Alchemilla","Knautia"), ]
    cu2 <- droplevels(cu2)
    cu2.s <- split(cu2$DEGREE, cu2$HOST)
    

    No look on these samples graphically:

    Code \(\PageIndex{5}\) (Python):

    cu <- read.table("http://ashipunov.info/shipunov/open/cuscuta.txt",h=TRUE, sep="	")
    cu2 <- cu[cu$HOST %in% c("Alchemilla","Knautia"), ]
    cu2 <- droplevels(cu2)
    cu2.s <- split(cu2$DEGREE, cu2$HOST)
    boxplot(cu2.s)
    

    There is a prominent difference. Now to numbers:

    Code \(\PageIndex{6}\) (Python):

    cu <- read.table("http://ashipunov.info/shipunov/open/cuscuta.txt",h=TRUE, sep="	")
    cu2 <- cu[cu$HOST %in% c("Alchemilla","Knautia"), ]
    cu2 <- droplevels(cu2)
    cu2.s <- split(cu2$DEGREE, cu2$HOST)
    sapply(cu2.s, median)
    cliff.delta(cu2.s$Alchemilla, cu2.s$Knautia)
    wilcox.test(cu2.s$Alchemilla, cu2.s$Knautia)
    

    Interesting! Despite on the difference between medians and large effect size, Wilcoxon test failed to support it statistically. Why? Were shapes of distributions similar?

    Code \(\PageIndex{7}\) (Python):

    cu <- read.table("http://ashipunov.info/shipunov/open/cuscuta.txt",h=TRUE, sep="	")
    cu2 <- cu[cu$HOST %in% c("Alchemilla","Knautia"), ]
    cu2 <- droplevels(cu2)
    cu2.s <- split(cu2$DEGREE, cu2$HOST)
    ansari.test(cu2.s$Alchemilla, cu2.s$Knautia)
    library(beeswarm)
    la <- layout(matrix(c(1, 3, 2, 3), ncol=2, byrow=TRUE))
    for (i in 1:2) hist(cu2.s[[i]], main=names(cu2.s)[i], xlab="", xlim=range(cu2.s))
    bxplot(cu2.s) ; beeswarm(cu2.s, cex=1.2, add=TRUE)
    

    (Please note how to make complex layout with layout() command. This commands takes matrix as argument, and then simply place plot number something to the position where this number occurs in the matrix. After layout was created, you can check it with command layout.show(la).)

    Screen Shot 2019-02-01 at 10.28.10 PM.png
    Figure \(\PageIndex{2}\) Histograms of two sample distributions, plus beswarm plot with boxplot lines.

    As both Ansari-Bradley test and plots suggest, distributions are really different (Figure \(\PageIndex{2}\)). One workaround is to use robust rank order test which is not so sensitive to the differences in variation:

    Code \(\PageIndex{8}\) (Python):

    cu <- read.table("http://ashipunov.info/shipunov/open/cuscuta.txt",h=TRUE, sep="	")
    cu2 <- cu[cu$HOST %in% c("Alchemilla","Knautia"), ]
    cu2 <- droplevels(cu2)
    cu2.s <- split(cu2$DEGREE, cu2$HOST)
    Rro.test(cu2.s$Alchemilla, cu2.s$Knautia) # asmisc.r
    

    This test found the significance.

    Now we will try to bootstrap the difference between medians:

    Code \(\PageIndex{9}\) (Python):

    cu <- read.table("http://ashipunov.info/shipunov/open/cuscuta.txt",h=TRUE, sep="	")
    cu2 <- cu[cu$HOST %in% c("Alchemilla","Knautia"), ]
    cu2 <- droplevels(cu2)
    cu2.s <- split(cu2$DEGREE, cu2$HOST)
    library(boot)
    meddif.b <- function (data, ind) { d <- data[ind]; median(d[cu2$HOST == "Alchemilla"]) - median(d[cu2$HOST == "Knautia"]) }
    meddif.boot <- boot(data=cu2$DEGREE, statistic=meddif.b, strata=cu2$HOST, R=999)
    boot.ci(meddif.boot, type="bca")
    

    (Please note how strata was applied to avoid mixing of two different hosts.)

    This is not dissimilar to what we saw above in the effect size output: large difference but 0 included. This could be described as “prominent but unstable” difference.

    That was not asked in assignment but how to analyze whole data in case of so different shapes of distributions. One possibility is the Kruskal test with Monte-Carlo replications. By default, it makes 1000 tries:

    Code \(\PageIndex{10}\) (Python):

    cu <- read.table("http://ashipunov.info/shipunov/open/cuscuta.txt",h=TRUE, sep="	")
    library(coin)
    kruskal_test(DEGREE ~ HOST, data=cu, distribution=approximate())
    

    There is no overall significance. It is not a surprise, ANOVA-like tests could sometimes contradict with individual or pairwise.

    Another possibility is a post hoc robust rank order test:

    Code \(\PageIndex{11}\) (Python):

    cu <- read.table("http://ashipunov.info/shipunov/open/cuscuta.txt",h=TRUE, sep="	")
    pairwise.Rro.test(cu$DEGREE, cu$HOST) # asmisc.r
    

    Now it found some significant differences but did not reveal it for our marginal, unstable case of Alchemilla and Knautia.