Skip to main content
Statistics LibreTexts

4.5: How to create your own functions

  • Page ID
  • Shapiro-Wilk test is probably the fastest way to check normality but its output is not immediately understandable. It is also not easy to apply for whole data frames. Let us create the function which overcomes these problems:

    Code \(\PageIndex{1}\) (R):

    Normality <- function(a) {ifelse(shapiro.test(a)$p.value < 0.05, "NOT NORMAL", "NORMAL")}

    (We used here the fact that in R, test output is usually a list and each component is possible to extract using $-name approach described in previous chapter. How to know what to extract? Save test output into object and run str(obj)!)

    Collection asmisc.r contains slightly more advanced version of the Normality() which takes into account that Shapiro-Wilks test is not so reliable for small size (\(< 25\)) samples.

    To make this Normality() function work, you need to copy the above text into R console, or into the separate file (preferably with *.r extension), and then load it with source() command. Next step is to call the function:

    Code \(\PageIndex{2}\) (R):

    salary <- c(21, 19, 27, 11, 102, 25, 21)
    Normality <- function(a) {ifelse(shapiro.test(a)$p.value < 0.05, "NOT NORMAL", "NORMAL")}
    Normality(salary) # asmisc.r
    sapply(trees, Normality)
    sapply(log(trees+0.01), Normality)

    (Note that logarithmic conversion could change the normality. Check yourself if square root does the same.)

    This function not only runs Shapiro-Wilks test several times but also outputs an easily readable result. Most important is the third row which uses p-value extracted from the test results. Extraction procedure is based on the knowledge of the internal structure of shapiro.test() output.

    output object without going into help?

    In many cases, “stationary”, named function is not necessary as user need some piece of code which runs only once (but runs in relatively complicated way). Here helps the anonymous function. It is especially useful within functions of apply() family. This is how to calculate mode simultaneously in multiple columns:

    Code \(\PageIndex{3}\) (R):

    x <- c(174, 162, 188, 192, 165, 168, 172.5)
    sapply(chickwts,function(.x) names(table(.x))[which.max(table(.x))])

    (Here we followed the agreement that in the anonymous functions, argument names must start with a dot.)

    Even more useful—simultaneous confidence intervals:

    Code \(\PageIndex{4}\) (R):

    old.options <- options(warn=-1)
    x <- c(174, 162, 188, 192, 165, 168, 172.5)
    sapply(trees, function(.x) wilcox.test(.x,$

    (Here we suppressed multiple “ties” warnings. Do not do it yourself without a strong reason!)

    in the open data repository contains measurements of several birch morphological characters. Are there any characters which could be analyzed with parametric methods?

    contains explanation of variables.)

    description characteristics as possible, calculate the appropriate confidence interval and plot this data.

    ) distinguish these species most. Provide the answer in the form “if the character is ..., then species is ..”..