Day 5

geom_bar from summarized data

  • Last time we plotted survey response data using geom_bar
    • each row represented one response
    • ggplot automatically summarized them for us
  • What if our data is already summarized?
    • Below is school district passing percentages by year:
# A tibble: 12 x 3
   percent meets           year 
     <dbl> <fct>           <fct>
 1    24.5 Exceeds         2015 
 2    33.7 Meets           2015 
 3    19.4 Partially Meets 2015 
 4    22.4 Does not Meet   2015 
 5    27.6 Exceeds         2016 
 6    32.1 Meets           2016 
 7    20.2 Partially Meets 2016 
 8    20.2 Does not Meet   2016 
 9    26   Exceeds         2017 
10    33.7 Meets           2017 
11    20.1 Partially Meets 2017 
12    20.1 Does not Meet   2017 

geom_bar from summarized data

  • the percent variable tells us bar heights (y)
  • must be paired with stat="identity" in geom_bar
    • alternatively, can use geom_col
> ggplot(mca, aes(x=year, y=percent, fill=meets)) +
+   geom_bar(stat = "identity", position = "fill")

ggplot2 maps

The ggplot2 package contains latitude and longitude to define geographic boundries - some regions: state, usa, world, county - see maps package for more regions

> MainStates <- map_data("state")
> str(MainStates)
'data.frame':   15537 obs. of  6 variables:
 $ long     : num  -87.5 -87.5 -87.5 -87.5 -87.6 ...
 $ lat      : num  30.4 30.4 30.4 30.3 30.3 ...
 $ group    : num  1 1 1 1 1 1 1 1 1 1 ...
 $ order    : int  1 2 3 4 5 6 7 8 9 10 ...
 $ region   : chr  "alabama" "alabama" "alabama" "alabama" ...
 $ subregion: chr  NA NA NA NA ...

Maps using geom_path

geom_path connects the dots between lat (y) and long (x) points in a given group

> ggplot(MainStates) + 
+     geom_path(aes(x=long, y=lat, group=group))

Maps using geom_polygon

geom_polygon is like geom_path but it connects start and end points which allows you to fill a closed polygon shape

> ggplot(MainStates) + 
+     geom_polygon(aes(x=long, y=lat, group=group), color="black", fill="lightgreen")

Spatial data

  • Earth is a 3-D object but a map is 2-D
    • spatial projection transforms (projects) lat/long locations on a sphere to a cartesian plane
    • distance between latitude is constant (~69 miles/degree lat)
    • distance between longitude decreases as you approach the poles (~69 miles/degree at equator to 0 miles at poles)
  • many possible projections that preserve some feature
    • area, angle, distance, …
  • Mercator projection preserves angle but distorts area

Maps using coord_quickmap()

the coord_quickmap function provides a Mercator approximation by fixing a lat/long ratio - good for small regions close to equator

> ggplot(MainStates) + 
+     geom_polygon(aes(x=long, y=lat, group=group), color="black", fill="lightgreen") + 
+   coord_quickmap()

Maps using coord_map()

the coord_map function provides a Mercator projection - see mapproj package for more projection options

> ggplot(MainStates) + 
+     geom_polygon(aes(x=long, y=lat, group=group), color="black", fill="lightgreen") + 
+   coord_map()

World with coord_quickmap

> world <- map_data("world")
> ggplot(world) + 
+     geom_polygon(aes(x=long, y=lat, group=group), color="black", fill="white") + 
+   coord_quickmap()

World with coord_map

> ggplot(world) + 
+     geom_polygon(aes(x=long, y=lat, group=group), color="black", fill="white") + 
+   coord_map() + scale_x_continuous(limits=c(-180,180))

Cloropleth maps using geom_map

  • A cloropleth map uses color or shading of subregions to visual data
  • State level data from the 2016 American Community Survey
> ACS <- read.csv("https://raw.githubusercontent.com/mgelman/data/master/ACS2016.csv")
> 
> ACS <- ACS[ACS$region != "Alaska" & ACS$region != "Hawaii",] # only 48+D.C.
> ACS$region <- tolower(ACS$region)  # lower case (match MainStates regions)
> str(ACS)
'data.frame':   49 obs. of  7 variables:
 $ region        : chr  "alabama" "arizona" "arkansas" "california" ...
 $ PopSize       : int  4841164 6728577 2968472 38654206 5359295 3588570 934695 659009 19934451 10099320 ...
 $ MedianAge     : num  38.6 37.1 37.7 36 36.4 40.6 39.6 33.8 41.6 36.2 ...
 $ PercentFemale : num  51.5 50.3 50.9 50.3 49.8 51.2 51.6 52.6 51.1 51.3 ...
 $ BornInState   : int  3387845 2623391 1823628 21194542 2294446 1981427 425160 242020 7151459 5562769 ...
 $ MedianIncome  : int  23527 26565 22787 27772 31325 34124 30648 41160 25166 26132 ...
 $ PercentInState: num  70 39 61.4 54.8 42.8 ...

Cloropleth maps using geom_map

  • Want to visualize “% born in state”
  • Don’t need to merge data. Can have two separate data files: state data ACS + mapping data MainStates
> ggplot() + coord_map() + 
+   geom_map(data=ACS, aes(map_id = region, fill = PercentInState), map = MainStates) +
+   expand_limits(x=MainStates$long, y=MainStates$lat) + ggtitle("% Born in State")

Cloropleth maps using geom_map

  • Add point layer for state capitals in third data set
> capitals <- read.csv("https://raw.githubusercontent.com/mgelman/data/master/capitals.csv")
> capitals <- capitals[capitals$state != "Alaska" & capitals$state != "Hawaii",] # only 48
> ggplot(data=ACS) + coord_map() +
+   geom_map(data=ACS, map=MainStates, aes(map_id = region, fill = PercentInState)) + 
+    geom_point(data=capitals, aes(x=long,y=lat), size=.7, color="red") + 
+   expand_limits(x=MainStates$long, y=MainStates$lat) + ggtitle("% Born in State")

Adjusting color

  • When color= is a categorical variable, can use scale_color_brewer() or scale_color_manual() to change color
    • same for fill colors (change color to fill)
    • See textbook page 21 for Brewer color palette names
    • can use base-R colors in rainbow, colors or palette for manual
  • When color is a numeric variable, can use scale_color_distiller() for Brewer colors and versions of scale_color_gradient() for manual colors
    • sequential colors: for low to high data (one direction)
    • diverging colors: for data with a “middle” (higher or lower than the middle)

Color for numeric variables

> mymap <- ggplot() + coord_map() + 
+   geom_map(data=ACS, aes(map_id = region, fill = PercentInState), map = MainStates) +
+   expand_limits(x=MainStates$long, y=MainStates$lat) + ggtitle("% Born in State") 
> mymap + scale_fill_distiller()

Color for numeric variables

  • See ?scale_fill_distiller for palette option names
> mymap + scale_fill_distiller(palette = "Oranges")

Color for numeric variables

  • Manual choice, see colors() for color options
> mymap + scale_fill_gradient(low="lightpink", high="purple4")

Color for numeric variables

  • Sometimes we want to look at how numbers deviate from 0
  • Let’s look at the difference between state median income and national median income
    • national median income is estimated to be about $27,000 in 2016
> mymap2 <- ggplot() + coord_map() + 
+   geom_map(data=ACS, aes(map_id = region, fill = MedianIncome - 27000), map = MainStates) +
+   expand_limits(x=MainStates$long, y=MainStates$lat) + ggtitle("Difference from US median income") 

Color for numeric variables

  • Sequential choice (default) doesn’t highlight deviations from overall median income
> mymap2 + scale_fill_distiller(palette = "Oranges")

Color for numeric variables

  • change type="div" and color palette
> mymap2 + scale_fill_distiller(type="div", palette = "RdYlBu")

Color for numeric variables

  • can switch direction of colors
> mymap2 + scale_fill_distiller(type="div", palette = "RdYlBu", direction = 1)

Color for numeric variables

  • gradient2 manually fills diverging colors, can fix mid color
> mymap2 + scale_fill_gradient2(midpoint = 0, low="red", high="blue", name="Difference ($)")

Color for categorical variables

  • Another fill color used!
> mybar <- ggplot(mca, aes(x=year, y=percent, fill=meets)) +
+   geom_bar(stat = "identity", position = "fill")
> mybar + scale_fill_brewer()

Color for categorical variables

  • Could use a diverging color
> mybar + scale_fill_brewer(palette = "Spectral", direction=-1)

Color for categorical variables

  • can also manually fill colors
> mybar + scale_fill_manual(values=c("blue", "green","yellow","red"))