Visualizations and the Grammar of Graphics
MACS 30500
University of Chicago
September 28, 2016
1 |
11 |
9 |
7.500909 |
0.8164205 |
2 |
11 |
9 |
7.500909 |
0.8162365 |
3 |
11 |
9 |
7.500000 |
0.8162867 |
4 |
11 |
9 |
7.500909 |
0.8165214 |
1 |
(Intercept) |
3.0000909 |
1.1247468 |
2.667348 |
0.0257341 |
2 |
(Intercept) |
3.0009091 |
1.1253024 |
2.666758 |
0.0257589 |
3 |
(Intercept) |
3.0024545 |
1.1244812 |
2.670080 |
0.0256191 |
4 |
(Intercept) |
3.0017273 |
1.1239211 |
2.670763 |
0.0255904 |
1 |
x |
0.5000909 |
0.1179055 |
4.241455 |
0.0021696 |
2 |
x |
0.5000000 |
0.1179637 |
4.238590 |
0.0021788 |
3 |
x |
0.4997273 |
0.1178777 |
4.239372 |
0.0021763 |
4 |
x |
0.4999091 |
0.1178189 |
4.243028 |
0.0021646 |
1 |
0.6665425 |
0.6294916 |
1.236603 |
-16.84069 |
39.68137 |
40.87506 |
2 |
0.6662420 |
0.6291578 |
1.237214 |
-16.84612 |
39.69224 |
40.88593 |
3 |
0.6663240 |
0.6292489 |
1.236311 |
-16.83809 |
39.67618 |
40.86986 |
4 |
0.6667073 |
0.6296747 |
1.235696 |
-16.83261 |
39.66522 |
40.85890 |
Grammar
The whole system and structure of a language or of languages in general, usually taken as consisting of syntax and morphology (including inflections) and sometimes also phonology and semantics.
Grammar of graphics
- “The fundamental principles or rules of an art or science”
- Grammar of graphics - a grammar used to describe and create a wide range of statistical graphics
- Layered grammar of graphics
Layered grammar of graphics
- Layer
- Data
- Mapping
- Statistical transformation (stat)
- Geometric object (geom)
- Position adjustment (position)
- Scale
- Coordinate system (coord)
- Faceting (facet)
- Defaults
Layer
- Responsible for creating the objects that we perceive on the plot
- Defined by its subcomponents
Data and mapping
- Data defines the source of the information to be visualized
- Mapping defines how the variables are applied to the graphic
Data: mpg
## # A tibble: 234 × 11
## manufacturer model displ year cyl trans drv cty hwy
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int>
## 1 audi a4 1.8 1999 4 auto(l5) f 18 29
## 2 audi a4 1.8 1999 4 manual(m5) f 21 29
## 3 audi a4 2.0 2008 4 manual(m6) f 20 31
## 4 audi a4 2.0 2008 4 auto(av) f 21 30
## 5 audi a4 2.8 1999 6 auto(l5) f 16 26
## 6 audi a4 2.8 1999 6 manual(m5) f 18 26
## 7 audi a4 3.1 2008 6 auto(av) f 18 27
## 8 audi a4 quattro 1.8 1999 4 manual(m5) 4 18 26
## 9 audi a4 quattro 1.8 1999 4 auto(l5) 4 16 25
## 10 audi a4 quattro 2.0 2008 4 manual(m6) 4 20 28
## # ... with 224 more rows, and 2 more variables: fl <chr>, class <chr>
Data: mpg
## # A tibble: 234 × 2
## displ hwy
## <dbl> <int>
## 1 1.8 29
## 2 1.8 29
## 3 2.0 31
## 4 2.0 30
## 5 2.8 26
## 6 2.8 26
## 7 3.1 27
## 8 1.8 26
## 9 1.8 25
## 10 2.0 28
## # ... with 224 more rows
Mapping: mpg
## # A tibble: 234 × 2
## x y
## <dbl> <int>
## 1 1.8 29
## 2 1.8 29
## 3 2.0 31
## 4 2.0 30
## 5 2.8 26
## 6 2.8 26
## 7 3.1 27
## 8 1.8 26
## 9 1.8 25
## 10 2.0 28
## # ... with 224 more rows
Geometric objects (geoms)
- Control the type of plot you create
- 0 dimensions - point, text
- 1 dimension - path, line
- 2 dimensions - polygon, interval
- Geoms have specific aesthetics
- Point geom - position, color, shape, and size
- Bar geom - position, height, width, and fill
Point geom
Bar geom
Raw data
## # A tibble: 234 × 1
## cyl
## <int>
## 1 4
## 2 4
## 3 4
## 4 4
## 5 6
## 6 6
## 7 6
## 8 4
## 9 4
## 10 4
## # ... with 224 more rows
## # A tibble: 4 × 2
## cyl n
## <int> <int>
## 1 4 81
## 2 5 4
## 3 6 79
## 4 8 70
Position adjustment
Position adjustment
Scale
- Controls the mapping from data to aesthetic attributes
Scale: color
Scale: size
Coordinate system (coord)
- Maps the position of objects onto the plane of the plot
Cartesian coordinate system
Semi-log
Polar
Faceting
Defaults
ggplot() +
layer(
data = mpg, mapping = aes(x = displ, y = hwy),
geom = "point", stat = "identity", position = "identity"
) +
scale_x_continuous() +
scale_y_continuous() +
coord_cartesian()
Defaults
ggplot() +
layer(
data = mpg, mapping = aes(x = displ, y = hwy),
geom = "point", stat = "identity", position = "identity"
) +
scale_x_continuous() +
scale_y_continuous() +
coord_cartesian()
ggplot() +
layer(
data = mpg, mapping = aes(x = displ, y = hwy),
geom = "point"
)
Defaults
ggplot() +
layer(
data = mpg, mapping = aes(x = displ, y = hwy),
geom = "point", stat = "identity", position = "identity"
) +
scale_x_continuous() +
scale_y_continuous() +
coord_cartesian()
ggplot() +
layer(
data = mpg, mapping = aes(x = displ, y = hwy),
geom = "point"
)
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point()
Defaults
ggplot() +
layer(
data = mpg, mapping = aes(x = displ, y = hwy),
geom = "point", stat = "identity", position = "identity"
) +
scale_x_continuous() +
scale_y_continuous() +
coord_cartesian()
ggplot() +
layer(
data = mpg, mapping = aes(x = displ, y = hwy),
geom = "point"
)
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point()
ggplot(mpg, aes(displ, hwy)) +
geom_point()
Minard’s grammar
- Troops
- Latitude
- Longitude
- Survivors
- Advance/retreat
- Cities
- Latitude
- Longitude
- City name
- Layer
- Data
- Mapping
- Statistical transformation (stat)
- Geometric object (geom)
- Position adjustment (position)
- Scale
- Coordinate system
- Faceting
Gapminder
library(ggplot2)
library(tibble)
# install.packages("gapminder")
library(gapminder)
data("gapminder")
gapminder
## # A tibble: 1,704 × 6
## country continent year lifeExp pop gdpPercap
## <fctr> <fctr> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.801 8425333 779.4453
## 2 Afghanistan Asia 1957 30.332 9240934 820.8530
## 3 Afghanistan Asia 1962 31.997 10267083 853.1007
## 4 Afghanistan Asia 1967 34.020 11537966 836.1971
## 5 Afghanistan Asia 1972 36.088 13079460 739.9811
## 6 Afghanistan Asia 1977 38.438 14880372 786.1134
## 7 Afghanistan Asia 1982 39.854 12881816 978.0114
## 8 Afghanistan Asia 1987 40.822 13867957 852.3959
## 9 Afghanistan Asia 1992 41.674 16317921 649.3414
## 10 Afghanistan Asia 1997 41.763 22227415 635.3414
## # ... with 1,694 more rows
Gapminder
- What is the average life expectancy, per continent?
- What is the relationship between GDP and life expectancy?
- Bonus: what is causing the outlier in
gdpPercap
?