A very brief..

Intro to R

mladencucak@gmail.com

Topics

About R/RStudio
Basics of programming with R
Data analysis with tidyverse

These materials are based on the APS's “R for Plant Pathologists” “R for Plant Pathologists”
Some inspiration from J. Bryan's Stat545 and B. Bohemke's Intro to R
All highly recommended

Why R

Performance: stable, light and fast
Support network: documentation, community, developers
Reproducibility: anyone anywhere can reproduce results
Versatility: unified solution to almost any numerical problem and graphical capabilities
Ethics: accessible to anyone as it is free and open source

Be strong!

Transition from “point and click” is tough but rewarding

Baby steps

Help:

Google: just add “with R” at the end of any search
Stack Overflow: programming questions
Cross Validated: scientific questions

Learning:

“R for Data Science” → https://r4ds.had.co.nz
R4DS Learning Community → https://rfordatasci.com

Baby steps

Help:

Google: just add “with R” at the end of any search
Stack Overflow: programming questions
Cross Validated: scientific questions

Learning:

“R for Data Science” → https://r4ds.had.co.nz
R4DS Learning Community → https://rfordatasci.com

There are (too) many resources! So…

Stay focused!

Don't get overwhelmed!

Your new best friends

Cheatsheets → https://rstudio.com/resources/cheatsheets/

R – Statistical programming language

alt text

/http://www.r-project.org/

RStudio – Integrated Development Environment (IDE) makes our life much easier

alt text https://rstudio.com/

It may be described as...

R – Engine

RStudio – Dashboard

R interface

…is not the friendliest one…

RStudio (IDE)

Move onto some coding

Move the cursor onto a line with R code and pres:

(Win)Ctrl + Enter or
(MAC)Cmd + Return.

Challenge: Do it with one hand you are not using to hold the mouse!

Tips for later:
Many other keyboard shortcuts in RStudio (Win)Alt+Shift+K or (MAC)Option+Shift+K
For example, to run an entire script (Win)Ctrl + Shift + Enter or (MAC)Cmd + Shift + Return

R basics: In R, we have...

Objects, where the data is stored.

Assign with <-

x <- 1
y <- 2
x + y

[1] 3

the same result if:

1+3

[1] 4

R basics: In R, we have...

Objects, where the data is stored.

Assign with <-

x <- 1
y <- 2
x + y

[1] 3

the same result if:

1+3

[1] 4

Functions which are applied on objects or another functions (i.e. to analyze the data): round brackets!

# I am a comment!!! Just here to help jog the memory later on...
# Let us make a function!
addition <- function(argument_one,
                     argument_two){ 
  argument_one + argument_two # operations
} # curly brackets define operations

ls() # check content of the environment

[1] "addition" "x"        "y"

addition(argument_one = x,
         argument_two = y)

[1] 3

R basics: In R, we have...

Objects, where the data is stored.

Assign with <-

x <- 1
y <- 2
x + y

[1] 3

the same result if:

1+3

[1] 4

Functions which are applied on objects or another functions (i.e. to analyze the data): round brackets!

addition <- function(argument_one, argument_two){ 
  argument_one + argument_two 
} 
addition(argument_one = x,argument_two = y)

[1] 3

addition(x, y)# Notice the difference?!

[1] 3

addition(x, y) == x+y #notice double "="

[1] TRUE

all.equal(addition(x, y), x+y) #Same as above, but pre-made

[1] TRUE

Objects: Vectors

Vectors store data of the same type
(a column of an excel table)

Types of data:

num <- c(50, 60, 65) 

char <- c("mouse", "rat", "dog") 

fct <- factor("low", "med", "high")

dates <- as.Date(c("02/27/92", "02/27/92", "01/14/92"), "%m/%d/%y")

logical <-  c(FALSE, FALSE, TRUE) # only TRUE or FALSE

Objects: Vectors

Vectors store data of the same type
(a column of an excel table)

Types of data:

num <- c(50, 60, 65) 

char <- c("mouse", "rat", "dog") 

fct <- factor("low", "med", "high")

dates <- as.Date(c("02/27/92", "02/27/92", "01/14/92"), "%m/%d/%y")

logical <-  c(FALSE, FALSE, TRUE) # only TRUE or FALSE

Subsetting - square brackets

num[1] # 1st element

[1] 50

num[num >= 60] # More than or equal

[1] 60 65

char == "dog" # see logical on the left

[1] FALSE FALSE  TRUE

char[logical]

[1] "dog"

char[char == "dog"]

[1] "dog"

Objects: Dataframes

Dataframe is a set of vectors of same length(an entire excel table)

Creating and viewing data frames

df <- data.frame(col_one = num,
                 col_two = char)
print(df)

  col_one col_two
1      50   mouse
2      60     rat
3      65     dog

head(df,1)

  col_one col_two
1      50   mouse

Same logic for indexing, just in 2 dimensions

df[1, 1] # [rows, columns]

[1] 50

df[, 1] # 1st column in the data frame

[1] 50 60 65

df[, -2] # Exclude 2nd column

[1] 50 60 65

df[2:3, "col_two"]

[1] "rat" "dog"

df$col_two

[1] "mouse" "rat"   "dog"

R packages

Pre-made set of functions for common (and not so common) tasks

A package of R packages: tidyverse

Think of something like Microsoft Office suite

tidyverse and data analysis cycle

Data import

Several functions within readr and readxl for different types of files.
For this workshop, we will use data on coffee leaf rust from Ethiopia

dt <- read_csv(here::here("data", "survey_clean.csv"))
tibble::glimpse(dt, 70)

Rows: 405
Columns: 13
$ farm            <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...
$ region          <chr> "SNNPR", "SNNPR", "SNNPR", "SNNPR", "SNNP...
$ zone            <chr> "Bench Maji", "Bench Maji", "Bench Maji",...
$ district        <chr> "Debub Bench", "Debub Bench", "Debub Benc...
$ lon             <dbl> 35.44250, 35.44250, 35.42861, 35.42861, 3...
$ lat             <dbl> 6.904722, 6.904722, 6.904444, 6.904444, 6...
$ altitude        <dbl> 1100, 1342, 1434, 1100, 1400, 1342, 1432,...
$ cultivar        <chr> "Local", "Mixture", "Mixture", "Local", "...
$ shade           <chr> "Sun", "Mid shade", "Mid shade", "Sun", "...
$ cropping_system <chr> "Plantation", "Plantation", "Plantation",...
$ farm_management <chr> "Unmanaged", "Minimal", "Minimal", "Unman...
$ inc             <dbl> 86.70805, 51.34354, 43.20000, 76.70805, 4...
$ sev2            <dbl> 55.57986, 17.90349, 8.25120, 46.10154, 12...

Data transformation

dplyr Functions

Six key dplyr functions that allow you to solve the vast majority of your data transformation challenges:

Function	Description
`filter`	pick observations based on values
`select`	pick variables
`summarize`	compute statistical summaries
`group_by`	perform operations at different levels of your data
`arrange`	reorder data
`mutate`	create new variables

Piping

From magrittr package.
Traditional approach:

function(argument_one, argument_two,...)

pipe %\>% approach:

argument_one %>% 
  function(., argument_two,...)

Lets test these

Make small subset of data

(dt_small <- 
dt %>%
  select(cultivar, zone, inc) %>% 
  group_by(cultivar, zone) %>%
  slice(head(row_number(), 1)) %>% 
  filter(
    zone =="Sheka" |zone ==  "Sidama") %>% 
  ungroup())

# A tibble: 6 x 3
  cultivar zone     inc
  <chr>    <chr>  <dbl>
1 Improved Sheka   33.2
2 Improved Sidama  16.5
3 Local    Sheka   81.8
4 Local    Sidama  35.2
5 Mixture  Sheka   29.5
6 Mixture  Sidama  18.6

dt_small %>% 
  select(cultivar, inc) %>% 
  filter(inc <= 17)

# A tibble: 1 x 2
  cultivar   inc
  <chr>    <dbl>
1 Improved  16.5

dt_small %>%
  group_by(cultivar) %>%
  summarize(mean_inc = mean(inc),
            min_weight = min(inc)) %>%
    arrange(desc(mean_inc))

# A tibble: 3 x 3
  cultivar mean_inc min_weight
  <chr>       <dbl>      <dbl>
1 Local        58.5       35.2
2 Improved     24.8       16.5
3 Mixture      24.1       18.6

Reshaping data: wide

Important for data visualization

Our data subset is in long format

dt_small

# A tibble: 6 x 3
  cultivar zone     inc
  <chr>    <chr>  <dbl>
1 Improved Sheka   33.2
2 Improved Sidama  16.5
3 Local    Sheka   81.8
4 Local    Sidama  35.2
5 Mixture  Sheka   29.5
6 Mixture  Sidama  18.6

Reshaping data: wide

Important for data visualization

Our data subset is in long format

dt_small

# A tibble: 6 x 3
  cultivar zone     inc
  <chr>    <chr>  <dbl>
1 Improved Sheka   33.2
2 Improved Sidama  16.5
3 Local    Sheka   81.8
4 Local    Sidama  35.2
5 Mixture  Sheka   29.5
6 Mixture  Sidama  18.6

Change it to wide format with tidyr

names_from: column to columnS
values_from: column to values

(dt_small_wide <- 
dt_small %>%
  pivot_wider(names_from = "zone", 
              values_from = "inc"))

# A tibble: 3 x 3
  cultivar Sheka Sidama
  <chr>    <dbl>  <dbl>
1 Improved  33.2   16.5
2 Local     81.8   35.2
3 Mixture   29.5   18.6

Reshaping data: long

Can we do it the other way around?

dt_small_wide

# A tibble: 3 x 3
  cultivar Sheka Sidama
  <chr>    <dbl>  <dbl>
1 Improved  33.2   16.5
2 Local     81.8   35.2
3 Mixture   29.5   18.6

Change it to long format with pivot_longer()

cols: columns to column
values_from: values to columns

dt_small_wide %>% 
  pivot_longer(cols = 
                 c("Sheka", "Sidama"), 
               names_to = "zone",
               values_to = "inc")

# A tibble: 6 x 3
  cultivar zone     inc
  <chr>    <chr>  <dbl>
1 Improved Sheka   33.2
2 Improved Sidama  16.5
3 Local    Sheka   81.8
4 Local    Sidama  35.2
5 Mixture  Sheka   29.5
6 Mixture  Sidama  18.6

Congratulations!!

So, the painful part is done, enjoy the rest!