Functionals in R.

In a previous post, I went over vectorized functions in R. Vectorized functions are used to operate on all the elements of a vector at once. However, not all functions in R can be applied to all of the elements of a vector at once.

As another alternative to for-loops, I could use R’s functionals. A functional is a function that takes a function as an input and returns a vector as an output. DataQuest explains that in many situations, functionals eliminate the need for for-loops by allowing me to apply any function to all elements of a list or vector.

For this post, I decided to play around with a data set I found on Tidy Tuesday’s Github. The data set is about wine ratings and I thought it awesome to create a wine-themed post this week.

Base R includes a family of functionals known as the “apply” family. These functionals allow me to apply functions to elements of an object. However, the apply functions can be cumbersome to use due to its inconsistency in syntax and output.

Instead, DataQuest introduces me to the **purrr** package. The purrr package can be used for the same purposes as the apply family of functionals, and its consistency in syntax and output makes it easier to use and my code more legible.

I am then introduced to the purrr functional **map()**, DataQuest explains that the map() functional takes a vector or a list, applies a single-variable function to its elements, and returns a list. Let’s look at an example of this:

I created a list of points and prices of wines. Here is what the list looks like.

I found the lowest number in each pair of numbers by applying the min() function to each pair in the list **wine_results**. To use the map () functional, I would include the vector or list **(wine_results)** of data and the function **(min)** as arguments. I saved this to **min_results**. When I call min_results, I get this list below.

```
wine_results<-list(c(87, 15), c(84, 35), c(91, 95), c(96, 54), c(100, 350))
names(wine_results)<-c("Portuguese Red", "Carignan", "Merlot", "Pinot Noir", "Muscat")
min_results<-map(wine_results, min)
min_results
```

```
$`Portuguese Red`
[1] 15
$Carignan
[1] 35
$Merlot
[1] 91
$`Pinot Noir`
[1] 54
$Muscat
[1] 100
```

I could use map() to apply a function to multiple variables of my data frame. Let’s say I want to apply my **avg_wine_result** function to the **points** and price variables in the wine_ratings data frame. I could use select() to choose the variables you wanted to work with and map() to apply the function to them. In this case, I could install and load the purrr and dplyr packages separately. However, I already have tidyverse(installing tidyverse includes both purrr and dplyr) installed so I just loaded tidyverse using the library() function.

```
avg_wine_result <- function(x) {
mean(x, na.rm = TRUE)
}
per_wine<-wine_ratings %>% select(points, price) %>% map(avg_wine_result)
per_wine
```

```
$points
[1] 88.44714
$price
[1] 35.36339
```

The map() functional applies the **avg_wine_result()** function to each element of the scores data frame — the **points** and **price** variable vectors. The result is a list of output vectors for each variable. Note that map() always returns a list.

Though I used two variables in this example, this approach would scale well if I decided to apply a single-variable function to a much larger list of variables.

Note: I used **na.rm = TRUE** to exclude missing values when taking the mean of x. You can learn more about that here.

As I mentioned, map() always returns a list. But what if I want to specify an output of a different type?

The purrr package contains variants of the map() functional, which allow me to return a vector consisting of output of the specified data type. The following are four variants that I learned about in this lesson:

**map_lgl()**returns a logical vector**map_int()**returns an integer vector**map_dbl()**returns a double vector**map_chr()**returns a character vector

Note that in R integer and double data types are subsets of the numeric data type.

Let me show an example. Recall the **wine_results** list from an earlier example. I used the **map_dbl** functional to apply the **sum()** to the wine_results list. The result is a vector of double values with the name attributes retained:

```
wine_results<-list(c(87, 15), c(84, 35), c(91, 95), c(96, 54), c(100, 350))
names(wine_results)<-c("Portuguese Red", "Carignan", "Merlot", "Pinot Noir", "Muscat")
sum_dbl<-map_dbl(wine_results, sum)
sum_dbl
```

```
Portuguese Red Carignan Merlot Pinot Noir
102 119 186 150
Muscat
450
```

```
typeof(sum_dbl)
```

`[1] "double"`

So far, I’ve talked about using the map() functional to apply any single-variable function to elements of a list or vector. But what about functions with more than one variable?

When applying a function with two variables, I’ll need to use a different functional from the purrr package: map2(). The map2() functional takes two variables and a function as arguments and returns a list. Take a look at the example in the screenshot below.

The **x** list represents the points of the wine and the **y** list represents the prices of the wines. I want to calculate the proportion between points and prices. I then write a function to calculate the proportion. Next, I use the map2() functional to apply the **proportion_of_total** function to my x and y lists. The result is a list of the output.

```
x <- list(87, 84, 91, 96, 100)
y <- list(15, 35, 95, 54, 350)
proportion_of_total <- function(x,y) {
if(x + y > 0){
total = x + y
(x/total)
}else{
0
}
}
map2(x, y, proportion_of_total)
```

```
[[1]]
[1] 0.8529412
[[2]]
[1] 0.7058824
[[3]]
[1] 0.4892473
[[4]]
[1] 0.64
[[5]]
[1] 0.2222222
```

As with the map() functional, the purrr package includes variants of the map2() functional which are:

**map2_lgl()**returns a logical vector**map2_int()**returns an integer vector**map2_dbl()**returns a double vector**map2_chr()**returns a character vector

The screenshot below demonstrates the map2_chr() functional using the above example.

```
map2_chr(x,y,proportion_of_total)
```

`[1] "0.852941" "0.705882" "0.489247" "0.640000" "0.222222"`

As I mentioned before, functions can take more than two variables as arguments. How would I apply a function with more than two variables to a list and return a list? The purrr package contains a functional called **pmap()**, which works for functions with any number of variables as arguments.

Take a look at the screenshot below. This time I have three lists: **x** represents points, **y** represents price and **z** represents id numbers. I created a new list, **total_list**, containing the variables I’m working with. This new total list is created because when working with the pmap() functional, function arguments are provided as a list. Once I created this new list, I created the main_total function. I could then apply the function to the total list I created.

As you see in the screenshot, the result of pmap() returns a list.

```
x <- list(87, 84, 91, 96, 100)
y <- list(15, 35, 95, 54, 350)
z <- list(1, 13609, 168, 15845, 345)
total_list <-list(x, y, z)
main_total <- function(x,y,z) {
if(x <=100){
y+3
}else{
z -10
}
}
pmap(total_list, main_total)
```

```
[[1]]
[1] 18
[[2]]
[1] 38
[[3]]
[1] 98
[[4]]
[1] 57
[[5]]
[1] 353
```

As with the map() and map2() functionals, the purrr package also includes variants of the pmap() functional which are:

**pmap_lgl()**returns a logical vector**pmap_int()**returns an integer vector**pmap_dbl()**returns a double vector**pmap_chr()**returns a character vector

The screenshot below demonstrates the pmap_chr() functional using the above example.

```
pmap_chr(total_list, main_total)
```

`[1] "18.000000" "38.000000" "98.000000" "57.000000" "353.000000"`

Well, that was it for functionals. I must admit that I struggled with map2() and pmap() functionals. It took a while for me to get them working in R Studio. However, I am so happy that I got them working and was able to share what I learned with you. Until next time…

For attribution, please cite this work as

Brantley (2020, Feb. 19). Data Sci Dani: Functionals. Retrieved from https://datascidani.com/posts/2020-02-19-functionals/

BibTeX citation

@misc{brantley2020functionals, author = {Brantley, Danielle}, title = {Data Sci Dani: Functionals}, url = {https://datascidani.com/posts/2020-02-19-functionals/}, year = {2020} }