Functionals

data science programming R functions

Functionals in R.

Danielle Brantley https://gist.github.com/danielle-b
02-19-2020

In a previous post, I went over vectorized functions in R. Vectorized functions are used to operate on all the elements of a vector at once. However, not all functions in R can be applied to all of the elements of a vector at once.

As another alternative to for-loops, I could use R’s functionals. A functional is a function that takes a function as an input and returns a vector as an output. DataQuest explains that in many situations, functionals eliminate the need for for-loops by allowing me to apply any function to all elements of a list or vector.

For this post, I decided to play around with a data set I found on Tidy Tuesday’s Github. The data set is about wine ratings and I thought it awesome to create a wine-themed post this week.

Functionals Using The Tidyverse Purrr Package

Base R includes a family of functionals known as the “apply” family. These functionals allow me to apply functions to elements of an object. However, the apply functions can be cumbersome to use due to its inconsistency in syntax and output.

Instead, DataQuest introduces me to the purrr package. The purrr package can be used for the same purposes as the apply family of functionals, and its consistency in syntax and output makes it easier to use and my code more legible.

I am then introduced to the purrr functional map(), DataQuest explains that the map() functional takes a vector or a list, applies a single-variable function to its elements, and returns a list. Let’s look at an example of this:

I created a list of points and prices of wines. Here is what the list looks like.

I found the lowest number in each pair of numbers by applying the min() function to each pair in the list wine_results. To use the map () functional, I would include the vector or list (wine_results) of data and the function (min) as arguments. I saved this to min_results. When I call min_results, I get this list below.

wine_results<-list(c(87, 15), c(84, 35), c(91, 95), c(96, 54), c(100, 350))
names(wine_results)<-c("Portuguese Red", "Carignan", "Merlot", "Pinot Noir", "Muscat")
min_results<-map(wine_results, min)
min_results
$`Portuguese Red`
[1] 15

$Carignan
[1] 35

$Merlot
[1] 91

$`Pinot Noir`
[1] 54

$Muscat
[1] 100

Using Functionals to Apply Custom Functions

I could use map() to apply a function to multiple variables of my data frame. Let’s say I want to apply my avg_wine_result function to the points and price variables in the wine_ratings data frame. I could use select() to choose the variables you wanted to work with and map() to apply the function to them. In this case, I could install and load the purrr and dplyr packages separately. However, I already have tidyverse(installing tidyverse includes both purrr and dplyr) installed so I just loaded tidyverse using the library() function.

avg_wine_result <- function(x) {
  mean(x, na.rm = TRUE)  
}
per_wine<-wine_ratings %>% select(points, price) %>% map(avg_wine_result)

per_wine
$points
[1] 88.44714

$price
[1] 35.36339

The map() functional applies the avg_wine_result() function to each element of the scores data frame — the points and price variable vectors. The result is a list of output vectors for each variable. Note that map() always returns a list.

Though I used two variables in this example, this approach would scale well if I decided to apply a single-variable function to a much larger list of variables.

Note: I used na.rm = TRUE to exclude missing values when taking the mean of x. You can learn more about that here.

Using Functionals To Return Vectors of Specified Types

As I mentioned, map() always returns a list. But what if I want to specify an output of a different type?

The purrr package contains variants of the map() functional, which allow me to return a vector consisting of output of the specified data type. The following are four variants that I learned about in this lesson:

Note that in R integer and double data types are subsets of the numeric data type.

Let me show an example. Recall the wine_results list from an earlier example. I used the map_dbl functional to apply the sum() to the wine_results list. The result is a vector of double values with the name attributes retained:

wine_results<-list(c(87, 15), c(84, 35), c(91, 95), c(96, 54), c(100, 350))
names(wine_results)<-c("Portuguese Red", "Carignan", "Merlot", "Pinot Noir", "Muscat")
sum_dbl<-map_dbl(wine_results, sum)
sum_dbl
Portuguese Red       Carignan         Merlot     Pinot Noir 
           102            119            186            150 
        Muscat 
           450 
typeof(sum_dbl)
[1] "double"

Functionals for Two-Variable Functions

So far, I’ve talked about using the map() functional to apply any single-variable function to elements of a list or vector. But what about functions with more than one variable?

When applying a function with two variables, I’ll need to use a different functional from the purrr package: map2(). The map2() functional takes two variables and a function as arguments and returns a list. Take a look at the example in the screenshot below.

The x list represents the points of the wine and the y list represents the prices of the wines. I want to calculate the proportion between points and prices. I then write a function to calculate the proportion. Next, I use the map2() functional to apply the proportion_of_total function to my x and y lists. The result is a list of the output.

x <- list(87, 84, 91, 96, 100)
y <- list(15, 35, 95, 54, 350)

proportion_of_total <- function(x,y) {
  if(x + y > 0){
    total = x + y
    (x/total)
  }else{
    0
  }
}

map2(x, y, proportion_of_total)
[[1]]
[1] 0.8529412

[[2]]
[1] 0.7058824

[[3]]
[1] 0.4892473

[[4]]
[1] 0.64

[[5]]
[1] 0.2222222

As with the map() functional, the purrr package includes variants of the map2() functional which are:

The screenshot below demonstrates the map2_chr() functional using the above example.

map2_chr(x,y,proportion_of_total)
[1] "0.852941" "0.705882" "0.489247" "0.640000" "0.222222"

Functionals For Functions with More Than Two Variable Arguments

As I mentioned before, functions can take more than two variables as arguments. How would I apply a function with more than two variables to a list and return a list? The purrr package contains a functional called pmap(), which works for functions with any number of variables as arguments.

Take a look at the screenshot below. This time I have three lists: x represents points, y represents price and z represents id numbers. I created a new list, total_list, containing the variables I’m working with. This new total list is created because when working with the pmap() functional, function arguments are provided as a list. Once I created this new list, I created the main_total function. I could then apply the function to the total list I created.

As you see in the screenshot, the result of pmap() returns a list.

x <- list(87, 84, 91, 96, 100)
y <- list(15, 35, 95, 54, 350)
z <- list(1, 13609, 168, 15845, 345)

total_list <-list(x, y, z)
main_total <- function(x,y,z) {
  if(x <=100){
    y+3
  }else{
    z -10
  }
}
pmap(total_list, main_total)
[[1]]
[1] 18

[[2]]
[1] 38

[[3]]
[1] 98

[[4]]
[1] 57

[[5]]
[1] 353

As with the map() and map2() functionals, the purrr package also includes variants of the pmap() functional which are:

The screenshot below demonstrates the pmap_chr() functional using the above example.

pmap_chr(total_list, main_total)
[1] "18.000000"  "38.000000"  "98.000000"  "57.000000"  "353.000000"

Well, that was it for functionals. I must admit that I struggled with map2() and pmap() functionals. It took a while for me to get them working in R Studio. However, I am so happy that I got them working and was able to share what I learned with you. Until next time…

Citation

For attribution, please cite this work as

Brantley (2020, Feb. 19). Data Sci Dani: Functionals. Retrieved from https://datascidani.com/posts/2020-02-19-functionals/

BibTeX citation

@misc{brantley2020functionals,
  author = {Brantley, Danielle},
  title = {Data Sci Dani: Functionals},
  url = {https://datascidani.com/posts/2020-02-19-functionals/},
  year = {2020}
}