This part of the workshop was organized by Felipe Dalla Lana

Packages

These are the packages that will be used in this module. These packages should have been installed prior to the workshop.

ggplot2 basics


Data organization

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   farm = col_double(),
##   region = col_character(),
##   zone = col_character(),
##   district = col_character(),
##   lon = col_double(),
##   lat = col_double(),
##   altitude = col_double(),
##   cultivar = col_character(),
##   shade = col_character(),
##   cropping_system = col_character(),
##   farm_management = col_character(),
##   inc = col_double(),
##   sev2 = col_double()
## )
## # A tibble: 6 x 7
##   altitude cultivar shade     cropping_system farm_management   inc  sev2
##      <dbl> <chr>    <chr>     <chr>           <chr>           <dbl> <dbl>
## 1     1100 Local    Sun       Plantation      Unmanaged        86.7 55.6 
## 2     1342 Mixture  Mid shade Plantation      Minimal          51.3 17.9 
## 3     1434 Mixture  Mid shade Plantation      Minimal          43.2  8.25
## 4     1100 Local    Sun       Plantation      Unmanaged        76.7 46.1 
## 5     1400 Local    Sun       Plantation      Unmanaged        47.2 12.3 
## 6     1342 Mixture  Mid shade Plantation      Minimal          51.3 19.9

Let’s get started with ggplot. What happens if we only run ggplot() without any additional information?

Not much! We need to be sure to specify the data (survey_data) and aesthetic elements (y and x axes).

## [1] "character"
## NULL

BUT- we still didn’t say anything about what type of plot we want.

Below we make sure that our data is in the correct form. For the graphs we would like to make, the farm management stratifies need to be set as factors.

## # A tibble: 6 x 7
##   altitude cultivar shade     cropping_system farm_management   inc  sev2
##      <dbl> <chr>    <chr>     <chr>           <fct>           <dbl> <dbl>
## 1     1100 Local    Sun       Plantation      Unmanaged        86.7 55.6 
## 2     1342 Mixture  Mid shade Plantation      Minimal          51.3 17.9 
## 3     1434 Mixture  Mid shade Plantation      Minimal          43.2  8.25
## 4     1100 Local    Sun       Plantation      Unmanaged        76.7 46.1 
## 5     1400 Local    Sun       Plantation      Unmanaged        47.2 12.3 
## 6     1342 Mixture  Mid shade Plantation      Minimal          51.3 19.9

## [1] "factor"
## [1] "Unmanaged" "Minimal"   "Moderate"  "Intensive"

Now we can start adding the “meat and potatoes” to the graph.

Here, we add the simple box plot geometry:

Alternatively, we can use the geom_point() geometry:

Well, it’s hard to see the individual points in the plot above, so lets “jitter” them from side to side with the jitter geometry:

It would be even better if we could see both the data points AND the summary information that is available in a boxplot. Here we layer two geometries: geom_jitter and geom_boxplot.

See what happens when you reverse the order of those geometries:

Some other useful geometries include geom_violin, geom_histogram, and geom_density. These each give information about the distribution of our data:

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Here we look at geom_dotplot and geom_point(). Adding geom_smooth() and geom_rug() to geom_point() allow you to present additional information about the smoothed mean and distribution of data points.

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Aesthetics

Choosing aesthetics is critical for communicating the message of a plot. Colors, for example can be used strategically to convey information. Here we color by farm management type:

We can change the size of points as well as the transparency.

‘fill’ will apply color to the variable you indicate:

Size of points as well as transparency can be changed by setting ‘size’ or ‘alpha’ equal to the continuous variable that you would like to use.

Change the shapes of each level (a good option if a figure will be black and white).

You can also adjust the border color of points:

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Scales

You can manually adjust the x and y axis scales. This can be very useful when creating publication ready plots.

You can also manually change the colors of factor levels.

There are pre-defined color palettes in scale_color_brewer:

ggsci is a package that offers a collection of color palettes inspired by colors used in scientific journals, data visualization libraries, science fiction movies, and TV shows.

Lines can also be changed in ggplot, here is an example where we manually changed the type of line for each level. This graph shows incidence vs. severity.

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Facets

Faceting is an easy way to break data into multipe plots with the same themes.

## # A tibble: 6 x 7
##   altitude cultivar shade     cropping_system farm_management   inc  sev2
##      <dbl> <chr>    <chr>     <chr>           <fct>           <dbl> <dbl>
## 1     1100 Local    Sun       Plantation      Unmanaged        86.7 55.6 
## 2     1342 Mixture  Mid shade Plantation      Minimal          51.3 17.9 
## 3     1434 Mixture  Mid shade Plantation      Minimal          43.2  8.25
## 4     1100 Local    Sun       Plantation      Unmanaged        76.7 46.1 
## 5     1400 Local    Sun       Plantation      Unmanaged        47.2 12.3 
## 6     1342 Mixture  Mid shade Plantation      Minimal          51.3 19.9

Here we need to make sure our data is correctly organized. This will help us visualize incicence and severity side-by-side. We also rename inc to “Incidence” and sev2 to “Severity” because these will be the labels of our facet plots.

## # A tibble: 6 x 7
##   altitude cultivar shade     cropping_system farm_management metric     rate
##      <dbl> <chr>    <chr>     <chr>           <fct>           <fct>     <dbl>
## 1     1100 Local    Sun       Plantation      Unmanaged       Incidence 86.7 
## 2     1100 Local    Sun       Plantation      Unmanaged       Severity  55.6 
## 3     1342 Mixture  Mid shade Plantation      Minimal         Incidence 51.3 
## 4     1342 Mixture  Mid shade Plantation      Minimal         Severity  17.9 
## 5     1434 Mixture  Mid shade Plantation      Minimal         Incidence 43.2 
## 6     1434 Mixture  Mid shade Plantation      Minimal         Severity   8.25

Save plot as ‘p2’ and plot ‘p2’.

Here we facet_wrap by metric (incidence and severity are our metrics):

If we specify the number of columns (ncol) as 1, the plots will be stacked into a single column.

facet_grid() allows us to create a grid of plots.

Save

Saving according to the size that you would like can be tricky, we recommend using ggsave.

# Extra plots

Here we explore using geom_errorbar()

## # A tibble: 4 x 5
##   farm_management inc_m inc_sd lower upper
## * <fct>           <dbl>  <dbl> <dbl> <dbl>
## 1 Unmanaged        55.4  12.9   42.6  68.3
## 2 Minimal          38.0  11.1   26.9  49.2
## 3 Moderate         37.8   8.31  29.5  46.1
## 4 Intensive        19.7   7.56  12.1  27.2