How do you choose which geometry to use? ggplot allows you to choose from a number of geometries. This choice will determine what sort of plot you create. We will use the built-in mpg dataset, which contains fuel efficiency data for a number of different cars.
The histogram shows the ogerall distribution of the data. Here we use the nclass.FD function to compute the optimal bin size.
ggplot(mpg, aes(hwy)) + geom_histogram(bins = nclass.FD(mpg$hwy)) + xlab('Highway mileage')
Instead of creating discrete bins, we can look at relative density continuously.
7.4.2 Density plot
ggplot(mpg, aes(hwy)) + geom_density() + xlab('Highway mileage')
A note on defaults: The default statistic (or “stat”) underlying
geom_density is called “density” – not surprising. The default stat for
geom_histogram is “count”. What do you think would happen if you overrode the default and set
ggplot(mpg, aes(hwy)) + geom_density(stat = "count")
What we discover is that the geometric difference between
geom_density can actually be generalized.
geom_histogram is a shortcut for working with
geom_density is a shortcut for working with
7.4.3 Bar vs. line plots
ggplot(mpg, aes(hwy)) + geom_bar(stat = "count")
Note that the geometry tells ggplot what kind of plot to use, and the statistic (stat) tells it what kind of summary to present.
ggplot(mpg, aes(hwy)) + geom_line(stat = "density")