Hezi Buba & Irene Steves
18/12/2018
Presentation materials: https://github.com/ecodatasci-tlv/functions
Sometimes, we tend to repeat ourselves when coding: Repeating similar analyses, getting data ready for plots, etc. Functions have multiple advantages over copy and pasting chuncks of code:
A good rule of thumb is to not copy and paste code more than twice.
df$a <- (df$a - min(df$a, na.rm = TRUE)) /
(max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))
df$b <- (df$b - min(df$b, na.rm = TRUE)) /
(max(df$b, na.rm = TRUE) - min(df$a, na.rm = TRUE))
df$c <- (df$c - min(df$c, na.rm = TRUE)) /
(max(df$c, na.rm = TRUE) - min(df$c, na.rm = TRUE))
df$d <- (df$d - min(df$d, na.rm = TRUE)) /
(max(df$d, na.rm = TRUE) - min(df$d, na.rm = TRUE))
“It is faster to make a four-inch mirror then a six-inch mirror than to make a six-inch mirror.”
name <- function(variables) {
}
Suprisingly, not so much. Focus on your function doing just one thing.
Most of R's functions are less than 12 lines long!
Function names should be kept short yet informative. Remember: functions are meant to help codes be reusable and readable.
What are good names for these two functions?
f1 <- function(string, prefix) {
substr(string, 1, nchar(prefix)) == prefix
}
f2 <- function(x) {
if (length(x) <= 1) return(NULL)
x[-length(x)]
}
Sometimes, you won't save the function like seen below, but rather - use it directly in a code:
matrix_of_numbers <- matrix(1:100,10,10)
apply(matrix_of_numbers,2,function(x) x^2)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 121 441 961 1681 2601 3721 5041 6561 8281
[2,] 4 144 484 1024 1764 2704 3844 5184 6724 8464
[3,] 9 169 529 1089 1849 2809 3969 5329 6889 8649
[4,] 16 196 576 1156 1936 2916 4096 5476 7056 8836
[5,] 25 225 625 1225 2025 3025 4225 5625 7225 9025
[6,] 36 256 676 1296 2116 3136 4356 5776 7396 9216
[7,] 49 289 729 1369 2209 3249 4489 5929 7569 9409
[8,] 64 324 784 1444 2304 3364 4624 6084 7744 9604
[9,] 81 361 841 1521 2401 3481 4761 6241 7921 9801
[10,] 100 400 900 1600 2500 3600 4900 6400 8100 10000
To further streamline your code, use iterations to repeat chuncks of code.
Most basic are for
and while
loops. However, there are more ways to iterate code.
Iterations have three main components: an output, a sequence to iterate over, and the body of code.
library(tictoc)
n_times <- 50000
tic()
a <- NULL
for(i in seq_len(n_times)){
a <- c(a, i^2)
a
}
toc()
tic()
a <- vector("double", n_times)
for(i in seq_len(n_times)){
a[i] <- i^2
}
toc()
for (i in (1:10))
is a sequence. So is while TRUE
.
for (i in seq_along(vector))
is a better way of sequencing if you might get a vector of length 0 like so:
y <- vector("double", 0)
seq_along(y)
integer(0)
#> integer(0)
1:length(y)
[1] 1 0
#> [1] 1 0
That's a common phrase when working with data. So common that there is a tidyverse package that does it for you.
We will discuss it shortly…
That means you can wrap loops within a function and just call that function when neccessary. Remember - limit your copy and paste as much as possible!
library(tidyverse)
data <- tibble(a=rnorm(10),
b=rnorm(10))
col_means <- function(dataframe){
output <- vector("double",ncol(dataframe))
for (i in seq_along(dataframe)){
output[[i]] <- mean(dataframe[[i]])
}
return(output)
}
col_means(data)
[1] -0.2508394 -0.3507583
Create a function that returns the full song for any number of any vessel (bottles,cans, even boxes… ) of any drink (But no Jägermeister please):
99 bottles of beer on the wall, 99 bottles of beer. Take one down, pass it around - 98 bottles of beer on the wall