Workflow Basics and ggplot2 in R

Robin Choudhury
2018-07-04

Workflow Basics in R

For this Section we will be using RStudio

Coding Basics 1

R can work like a calculator…

1 / 200 * 30
[1] 0.15
pi *3
[1] 9.424778

Base R knows pi as the constant 3.14....

Coding Basics 2

R can also handle text…

print("Hello, World!")
[1] "Hello, World!"

And can mix text and calculations…

paste("The value of pi is", pi)
[1] "The value of pi is 3.14159265358979"

Coding Basics 3

You can also create new objects with <-

a <- 1 / 200 * 30; a
[1] 0.15

Then use those objects in subsequent calculations

a*3
[1] 0.45

Coding Basics 4

Object names must start with a letter, and needs to contain letters, numbers, _, and .

You can make assignments with either <- or =, but most people prefer <-. Using <- originated when APL computers had a single <- key on them (see http://blog.revolutionanalytics.com/2008/12/use-equals-or-arrow-for-assignment.html for more info).

R is case sensitive, so A will not tell you the value of a

ggplot2 and the Aesthetic of Graphics

ggplot2 is part of a a suite of packages in R known colloquially as the 'Tidyverse'. ggplot2 was developed to build graphs by mapping aesthetics, allowing users to flexibly create beautiful publication-quality figures.

library(tidyverse)
library(datasets)

The Dataset

I will be using the iris dataset:

  • Common example dataset
  • Fairly compact (150 obs. of 5 variables)
  • Flower measurement of 3 iris species
data(iris)

ggplot2 Basics: Making the First Call

As mentioned earlier, ggplot2 works by layering on different aesthetics from a dataset. ggplot2 first needs to know what dataset you plan to use, and what will be plotted.

ggplot(data=iris, aes(x = Sepal.Length,y = Sepal.Width))

plot of chunk unnamed-chunk-9 …pretty boring. Why?

ggplot2 Basics: Adding Aesthetics

Now we want to use a scatterplot method to mark out these two variables.

ggplot(data=iris, aes(x = Sepal.Length,y = Sepal.Width))+
  geom_point()

plot of chunk unnamed-chunk-10

ggplot2 Basics: Adding Smoothing Functions

ggplot(data=iris, aes(x = Sepal.Length,y = Sepal.Width))+
  geom_point() + geom_smooth()

plot of chunk unnamed-chunk-11

ggplot2 Basics: Color Points By Species

ggplot(data=iris, aes(x = Sepal.Length,y = Sepal.Width))+
  geom_point(aes(color=Species)) + geom_smooth()

plot of chunk unnamed-chunk-12

ggplot2 Basics: Size by Petal Length

ggplot(data=iris, aes(x = Sepal.Length,y = Sepal.Width))+
  geom_point(aes(color=Species, size=Petal.Length)) + geom_smooth()

plot of chunk unnamed-chunk-13

ggplot2 Basics: Color Lines and Points By Species

ggplot(data=iris, aes(x = Sepal.Length,y = Sepal.Width,color=Species))+
  geom_point(aes(size=Petal.Length)) + geom_smooth()

plot of chunk unnamed-chunk-14

ggplot2 Basics: Use Linear Smoothing

ggplot(data=iris, aes(x = Sepal.Length,y = Sepal.Width,color=Species))+
  geom_point(aes(size=Petal.Length)) + geom_smooth(method="lm")

plot of chunk unnamed-chunk-15

ggplot2 Basics: Different Line Types

ggplot(data=iris, aes(x = Sepal.Length,y = Sepal.Width,color=Species))+
  geom_point(aes(size=Petal.Length)) + geom_smooth(aes( linetype=Species), method="lm")

plot of chunk unnamed-chunk-16

ggplot2 Basics: Rug Really Ties the Room Together

ggplot(data=iris, aes(x = Sepal.Length,y = Sepal.Width,color=Species))+
  geom_point(aes(size=Petal.Length)) + geom_smooth(aes( linetype=Species), method="lm") +
  geom_rug()

plot of chunk unnamed-chunk-17

ggplot2 Basics: Density

ggplot(data=iris, aes(x = Sepal.Length,y = Sepal.Width,color=Species))+
  geom_point(aes(size=Petal.Length)) + 
  geom_density2d()

plot of chunk unnamed-chunk-18

ggplot2 Basics: Hex

ggplot(data=iris, aes(x = Sepal.Length,y = Sepal.Width))+
  geom_hex()

plot of chunk unnamed-chunk-19

ggplot2 Basics: Histograms

ggplot(data=iris, aes(Sepal.Length,fill=Species))+
geom_histogram()

plot of chunk unnamed-chunk-20

ggplot2 Basics: Histograms with Dodge

ggplot(data=iris, aes(Sepal.Length,fill=Species))+
geom_histogram(position = "dodge")

plot of chunk unnamed-chunk-21

ggplot2 Basics: Fill vs. Color

ggplot(data=iris, aes(Sepal.Length,fill=Species))+
geom_histogram(color="black")

plot of chunk unnamed-chunk-22

ggplot2 Basics: Inside vs. Outside of Aes

ggplot(data=iris, aes(Sepal.Length,fill=Species))+
geom_histogram(aes(color="black"))

plot of chunk unnamed-chunk-23

ggplot2 Basics: Alpha

ggplot(data=iris, aes(Sepal.Length,fill=Species))+
geom_histogram(aes(color="black"), alpha=0.5)

plot of chunk unnamed-chunk-24

ggplot2 Basics: Theme

ggplot(data=iris, aes(Sepal.Length,fill=Species))+
geom_histogram(color="black")+
  theme_bw()

plot of chunk unnamed-chunk-25

ggplot2 Basics: Theme

ggplot(data=iris, aes(Sepal.Length,fill=Species))+
geom_histogram(color="black", alpha=0.5)+
  theme_bw() + 
  theme(legend.position="bottom")

plot of chunk unnamed-chunk-26

ggplot2 Basics: Theme

ggplot(data=iris, aes(Sepal.Length,fill=Species))+
geom_histogram(color="black", alpha=0.5)+
  theme_bw() + 
  theme(legend.position="bottom",
        axis.title=element_text(size=20, face="bold"))

plot of chunk unnamed-chunk-27

ggplot2 Basics: Axis Labels

ggplot(data=iris, aes(Sepal.Length,fill=Species))+
geom_histogram(color="black", alpha=0.5)+
  theme_bw() + ylab("Count")+xlab("Sepal Length")+
  theme(legend.position="bottom",
        axis.title=element_text(size=20, face="bold"))

plot of chunk unnamed-chunk-28

ggplot2 Basics: stat_ vs geom_

ggplot(data=iris, aes(Sepal.Length,fill=Species))+
stat_bin()

plot of chunk unnamed-chunk-29

ggplot2 Basics: Multiple Datasets

ggplot()+
  geom_point(data=iris, aes(Sepal.Length, Sepal.Width))+
  geom_smooth(data=iris, aes(Sepal.Length, Sepal.Width ))

plot of chunk unnamed-chunk-30

ggplot2 Basics: Summarizing Data with Boxplots

ggplot(data=iris, aes(Species, Sepal.Width))+
  geom_jitter(alpha=0.2)+ geom_boxplot()

plot of chunk unnamed-chunk-31 ORDER MATTERS!

ggplot2 Basics: Faceting Data

ggplot(data=iris, aes(Sepal.Length, Sepal.Width))+
  geom_jitter(alpha=0.2)+ geom_smooth()+
  facet_grid(Species~.)

plot of chunk unnamed-chunk-32

ggplot2 Basics: Maps

nz <- map_data("nz")
ggplot(nz, aes(long, lat, group = group)) +
  geom_polygon(fill = "white", colour = "black")+
  coord_cartesian()

plot of chunk unnamed-chunk-33

ggplot2 Basics: Maps

data("world.cities") 
nz.city <- world.cities %>% filter(country.etc=="New Zealand")
ggplot(data=nz, aes(long, lat)) +
  geom_polygon(aes(group = group),fill = "white", colour = "black") +
  geom_point(data=nz.city, aes(long, lat, size = pop)) +
  coord_cartesian()

plot of chunk unnamed-chunk-34