Skip to main content
Statistics LibreTexts

7.1: The Grammar of Graphics

  • Page ID
    8739
  • or, the “gg” in ggplot

    Each language has a grammar consisting of types of words and the rules with which to string them together into sentences. If a sentence is grammatically correct, we’re able to parse it, even though that doesn’t ensure that it’s interesting, beautiful, or even meaningful.

    Similarly, plots can be divided up into their core components, which come together via a set of rules.

    Some of the major components are :

    • data
    • aesthetics
    • geometries
    • themes

    The data are the actual variables we’re plotting, which we pass to ggplot through the data argument. As you’ve learned, ggplot takes a dataframe in which each column is a variable.

    Now we need to tell ggplot how to plot those variables, by mapping each variable to an axis of the plot. You’ve seen that when we plot histograms, our variable goes on the x axis. Hence, we set x=<variable> in a call to aes() within ggplot(). This sets aesthetics, which are mappings of data to certain scales, like axes or things like color or shape. The plot still had two axes – x and y – but we didn’t need to specify what went on the y axis because ggplot knew by default that it should make a count variable.

    How was ggplot able to figure that out? Because of geometries, which are shapes we use to represent our data. You’ve seen geom_histogram, which basically gives our graph a bar plot shape, except that it also sets the default y axis variable to be count. Other shapes include points and lines, among many others.

    We’ll go over other aspects of the grammar of graphics (suxh as facets, statistics, and coordinates) as they come up. Let’s start visualizing some data by first choosing a theme, which describes all of the non-data ink in our plot, like grid lines and text.