Skip to main content
Statistics LibreTexts

7.1: How to draw the multivariate data

  • Page ID
    3582
  • The most simple operation with multidimensional data is to draw it.

    Pictographs

    Pictograph is a plot where each element represents one of objects, and every feature of the element corresponds with one character of the primary object. If the every row of data is unique, pictographs might be useful. Here is the star plot (Figure \(\PageIndex{1}\)) example:

    Code \(\PageIndex{1}\) (Python):

    eq8 <- read.table("data/eq8.txt", h=TRUE)
    str(eq8) # asmisc.r
    eq8m <- aggregate(eq8[, 2:9], list(eq8[, 1]), median, na.rm=TRUE)
    row.names(eq8m) <- eq8m[, 1]
    eq8m$Group.1 <- NULL
    stars(eq8m, cex=1.2, lwd=1.2, col.stars=rep("darkseagreen", 8))
    
    Screen Shot 2019-01-22 at 10.47.47 PM.png
    Figure \(\PageIndex{1}\) Stars show different horsetail species.

    (We made every element to represent the species of horsetails, and length of the particular ray corresponds with some morphological characters. It is easy to see, as an example, similarities between Equisetum \(\times\)litorale and E. fluviatile.)

    Slightly more exotic pictograph is Chernoff’s faces where features of elements are shown as human face characters (Figure \(\PageIndex{1}\)):

    Code \(\PageIndex{2}\) (Python):

    eq8 <- read.table("data/eq8.txt", h=TRUE)
    library(TeachingDemos)
    faces(eq8m)
    

    (Original Chernoff’s faces have been implemented in the faces2() function, there is also another variant in symbols() package.)

    Screen Shot 2019-01-22 at 10.50.27 PM.png
    Figure \(\PageIndex{2}\) Chernoff’s faces show different horsetail species.

    Related to pictographs are ways to overview the whole numeric dataset, matrix or data frame. First, command image() allows for plots like on Figure \(\PageIndex{3}\):

    Code \(\PageIndex{3}\) (Python):

    image(scale(iris[,-5]), axes=FALSE)
    axis(2, at=seq(0, 1, length.out=4), labels=abbreviate(colnames(iris[,-5])), las=2)
    

    (This is a “portrait” or iris matrix, not extremely informative but useful in many ways. For example, it is well visible that highest, most red, values of Pt.L (abbreviated from Petal.Length) correspond with lowest values of Sp.W (Sepal.Width). It is possible even to spot 3-species structure of this data.)

    Screen Shot 2019-01-22 at 10.52.31 PM.png
    Figure \(\PageIndex{3}\) Results of plotting iris data with the image() command. Redder colors correspond with higher values of scaled characters.

    More advanced is the parallel coordinates plot (Figure \(\PageIndex{4}\)):

    Code \(\PageIndex{4}\) (Python):

    library(MASS)
    parcoord(iris[,-5], col=as.numeric(iris[, 5]), lwd=2)
    legend("top", bty="n", lty=1, lwd=2, col=1:3, legend=names(table(iris[, 5])))
    
    Screen Shot 2019-01-22 at 10.54.21 PM.png
    Figure \(\PageIndex{4}\) Parallel coordinates plot.

    This is somewhat like the multidimensional stripchart. Every character is represented with one axis which has its values from all plants. Then, for every plant, these values were connected with lines. There are many interesting things which could be spotted from this plot. For example, it is clear that petal characters are more distinguishing than sepal. It is also visible that Iris setosa is more distinct from two other species, and so on.

    Grouped plots

    Even boxplots and dotcharts could represent multiple characters of multiple groups, but you will need to scale them first and then manually control positions of plotted elements, or use Boxplots() and Linechart() described in the previous chapter:

    Code \(\PageIndex{5}\) (Python):

    boxplot(iris[, 1:4], iris[, 5], srt=0, adj=c(.5, 1), legpos="topright") # asmisc.r
    Linechart(iris[, 1:4], iris[, 5], mad=TRUE) # asmisc.r
    

    (Please try these plots yourself.)

    Function matplot() allows to place multiple scatterplots in one frame, symbols() allows to place multiple smaller plots in desired locations, and function pairs() allows to show multiple scatterplots as a matrix (Figure \(\PageIndex{5}\)).

    Code \(\PageIndex{6}\) (Python):

    pairs(iris[, 1:4], pch=21, bg=as.numeric(iris[, 5]), oma=c(2, 2, 3, 2))
    oldpar <- par(xpd=TRUE)
    legend(0, 1.09, horiz=TRUE, legend=levels(iris[, 5]), pch=21, pt.bg=1:3, bty="n")
    par(oldpar)
    

    (This matrix plot shows dependencies between each possible pair of five variables simultaneously.)

    Matrix plot is just one of the big variety of R trellis plots. Many of them are in the lattice package (Figure \(\PageIndex{6}\)):

    Code \(\PageIndex{7}\) (Python):

    betula <- read.table( "http://ashipunov.info/shipunov/open/betula.txt", h=TRUE)
    library(lattice)
    d.tmp <- do.call(make.groups, betula[, c(2:4, 7:8)])
    d.tmp$LOC <- betula$LOC
    bwplot(data ~ factor(LOC) | which, data=d.tmp, ylab="")
    

    (Note how to use make.groups() and do.call() to stack all columns into the long variable (it is also possible to use stack(), see above). When LOC was added to temporary dataset, it was recycled five times—exactly what we need.)

    Library lattice offers multiple trellis variants of common R plots. For example, one could make the trellis dotchart which will show differences between horsetail species (Figure \(\PageIndex{7}\))

    Code \(\PageIndex{8}\) (Python):

    eq8 <- read.table("data/eq8.txt", h=TRUE)
    eq.s <- stack(as.data.frame(scale(eq8m)))
    eq.s$SPECIES <- row.names(eq8m)
    dotplot(SPECIES ~ values | ind, eq.s, xlab="")
    
    Screen Shot 2019-01-22 at 10.58.16 PM.png
    Figure \(\PageIndex{5}\) Matrix plot.

    (Here we stacked all numerical columns into one with stack().)

    Few trellis plots are available in the core R. This is our election data from previous chapter (Figure \(\PageIndex{8}\)):

    Code \(\PageIndex{9}\) (Python):

    elections <- read.table("data/elections.txt", h=TRUE)
    PROP <- cbind(CAND.1, CAND.2, CAND.3) / VOTER
    ATTEN <- (VALID + INVALID) / VOTER
    elections2 <- cbind(ATTEN, stack(data.frame(PROP)))
    coplot(percn ~ atten | cand, data=elections2, col="red", bg="pink", pch=21, bar.bg=c(fac="lightblue"))
    
    Screen Shot 2019-01-23 at 9.02.05 PM.png
    Figure \(\PageIndex{6}\) The example of trellis plot: for each measurement character, boxplots represent differences between locations.

    3D plots

    If there just three numerical variables, we can try to plot all of them with 3-axis plots. Frequently seen in geology, metallurgy and some other fields are ternary plots. They implemented, for example, in the vcd package. They use triangle coordinate system which allows to reflect simultaneously three measurement variables and some more categorical characters (via colors, point types etc.):

    Code \(\PageIndex{10}\) (Python):

    library(vcd)
    ternaryplot(scale(iris[, 2:4], center=FALSE), cex=.3, col=iris[, 5], main="")
    grid_legend(0.8, 0.7, pch=19, size=.5, col=1:3, levels(iris[, 5]))
    
    Screen Shot 2019-01-23 at 9.28.48 PM.png
    Figure \(\PageIndex{7}\) Trellis dotchart of the horsetail species (character values are scaled). These plots are typically read from the bottom.

    The “brick” 3D plot could be done, for example, with the package scatterplot3d (Figure \(\PageIndex{10}\)):

    Code \(\PageIndex{11}\) (Python):

    library(scatterplot3d)
    i3d <- scatterplot3d(iris[, 2:4], color=as.numeric(iris[, 5]), type="h", pch=14 + as.numeric(iris[, 5]), xlab="Sepal.Width", ylab="", zlab="Petal.Width")
    dims <- par("usr")
    x <- dims[1]+ 0.82*diff(dims[1:2])
    y <- dims[3]+ 0.1*diff(dims[3:4])
    text(x, y, "Petal.Length", srt=40)
    legend(i3d$xyz.convert(3.8, 6.5, 1.5), col=1:3, pch=(14 + 1:3), legend=levels(iris[, 5]), bg="white")
    
    Screen Shot 2019-01-23 at 9.31.13 PM.png
    Figure \(\PageIndex{8}\) Voting data from previous chapter represented with coplot() function.

    (Here some additional efforts were used to make y-axis label slanted.)

    These 3D scatterplots look attractive, but what if some points were hidden from the view? How to rotate and find the best projection? Library RGL will help to create the dynamic 3D plot:

    Code \(\PageIndex{12}\) (Python):

    library(rgl)
    plot3d(iris[, 1:3], col=as.numeric(iris[, 5]))
    

    Please run these commands yourself. The size of window and projection in RGL plots are controlled with mouse. That will help to understand better the position of every point. In case of iris data, it is visible clearly that one of the species (Iris setosa) is more distinct than two others, and the most “splitting” character is the length of petals (Petal.Length). There are four characters on the plot, because color was used to distinguish species. To save current RGL plot, you will need to run rgl.snapshot() or rgl.postscript() function. Please also note that RGL package depends on the external OpenGL library and therefore on some systems, additional installations might be required.

    Screen Shot 2019-01-23 at 9.33.12 PM.png
    Figure \(\PageIndex{9}\) Ternary plot for iris data.

    Another 3D possibility is cloud() from lattice package. It is a static plot with the relatively heavy code but important is that user can use different rotations (Figure 7.2.1):

    Code \(\PageIndex{13}\) (Python):

    library(lattice)
    p <- cloud(Sepal.Length ~ Petal.Length * Petal.Width, data=iris, groups=Species, par.settings=list(clip=list(panel="off")), auto.key=list(space="top", columns=3, points=TRUE))
    update(p[rep(1, 4)], layout=c(2, 2), function(..., screen) panel.cloud(..., screen=list(z=c(-70, 110)[current.column()], x=-70, y=c(140, 0)[current.row()])))
    
    Screen Shot 2019-01-23 at 9.34.57 PM.png
    Figure \(\PageIndex{10}\) Static 3D scatterplot of iris data.