Day 11: slide

Welcome back for the 11th day of the #packagecalendar, today we will be taking a look at the handy new package slide by Davis Vaughan. The main purpose of slide is to provide a more general-purpose approach to sliding window functions.

The package is not yet available from CRAN but can be downloaded with

remotes::install_github("DavisVaughan/slide")

The package on the surface looks a lot like purrr. You have 3 main functions slide(), slide_index() and slide_between(), who all have **_dbl(), **2() and p**() variants.

For the examples, we will use the data we also used for day 1.

# remotes::install_github("PMassicotte/gtrendsR")
library(gtrendsR)
library(skimr)
last_christmas <- gtrends("Last Christmas", time = "today 3-m")$interest_over_time

skim(last_christmas)
Table 1: Data summary
Name last_christmas
Number of rows 91
Number of columns 7
_______________________
Column type frequency:
character 4
numeric 2
POSIXct 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
geo 0 1 5 5 0 1 0
time 0 1 9 9 0 1 0
keyword 0 1 14 14 0 1 0
gprop 0 1 3 3 0 1 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
hits 0 1 25 27.63 2 5 7 48 100 ▇▁▂▂▁
category 0 1 0 0.00 0 0 0 0 0 ▁▁▇▁▁

Variable type: POSIXct

skim_variable n_missing complete_rate min max median n_unique
date 0 1 2019-11-29 2020-02-27 2020-01-13 91

Lets try to plot the hits variable over time (date) using ggplot2.

library(ggplot2)
ggplot(last_christmas, aes(date, hits)) +
  geom_point()

suppose we want to apply a rolling average to the hits variable. For this we will use the slide() function, you pass the object you want to iterate over and what to do it. Similar style to how we do it in purrr, notice how I specified _dbl() as I know the answer should be numeric.

library(slide)
slide_dbl(last_christmas$hits, ~mean(.x))
##  [1]  51  68  64  45  46  47  45  60  70  66  48  50  48  51  57  69  70  61  64
## [20]  68  71  76  80  79  69 100  98  52  37  34  33  24  23  21  14  12  13  12
## [39]   9   9   7   7   8   9   9   6   5   5   5   5   5   6   4   5   5   6   6
## [58]   8   7   5   4   5   4   5   7   7   4   4   5   5   5   7   7   5   4   4
## [77]   3   4   6   6   4   6   3   3   3   4   4   3   3   2   2

However, nothing actually happened in this example as the sliding window had length 1. Instead let’s calculate the average of the last 5 days, or I other words: today + last 4 days. We do this by specifying .before = 4.

slide_dbl(last_christmas$hits, mean, .before = 4)
##  [1] 51.0 59.5 61.0 57.0 54.8 54.0 49.4 48.6 53.6 57.6 57.8 58.8 56.4 52.6 50.8
## [16] 55.0 59.0 61.6 64.2 66.4 66.8 68.0 71.8 74.8 75.0 80.8 85.2 79.6 71.2 64.2
## [31] 50.8 36.0 30.2 27.0 23.0 18.8 16.6 14.4 12.0 11.0 10.0  8.8  8.0  8.0  8.0
## [46]  7.8  7.4  6.8  6.0  5.2  5.0  5.2  5.0  5.0  5.0  5.2  5.2  6.0  6.4  6.4
## [61]  6.0  5.8  5.0  4.6  5.0  5.6  5.4  5.4  5.4  5.0  4.6  5.2  5.8  5.8  5.6
## [76]  5.4  4.6  4.0  4.2  4.6  4.6  5.2  5.0  4.4  3.8  3.8  3.4  3.4  3.4  3.2
## [91]  2.8

Notice how the output has the same length as the input. This makes it easy to use inside other functions such as mutate().

Now let’s calculate the moving average again and plot it on top of our plot.

library(dplyr)
last_christmas %>%
  mutate(hits_ma5 = slide_dbl(hits, ~mean(.x), .before = 4)) %>%
  ggplot(aes(date, hits)) +
  geom_point() +
  geom_line(aes(y = hits_ma5), color = "firebrick")

Lastly, I would like to note that you can use arbitrary functions inside your slide(). Take a look at the additional resources for many more examples of what slide can do.

Additional resources