Day 11: slide

Welcome back for the 11th day of the #packagecalendar, today we will be taking a look at the handy new package slide by Davis Vaughan. The main purpose of slide is to provide a more general-purpose approach to sliding window functions.

The package is not yet available from CRAN but can be downloaded with

remotes::install_github("DavisVaughan/slide")

The package on the surface looks a lot like purrr. You have 3 main functions slide(), slide_index() and slide_between(), who all have **_dbl(), **2() and p**() variants.

For the examples, we will use the data we also used for day 1.

# remotes::install_github("PMassicotte/gtrendsR")
library(gtrendsR)
library(skimr)
last_christmas <- gtrends("Last Christmas", time = "today 3-m")$interest_over_time

skim(last_christmas)
Table 1: Data summary
Name last_christmas
Number of rows 89
Number of columns 7
_______________________
Column type frequency:
character 4
numeric 2
POSIXct 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
geo 0 1 5 5 0 1 0
time 0 1 9 9 0 1 0
keyword 0 1 14 14 0 1 0
gprop 0 1 3 3 0 1 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
hits 0 1 28.01 28.11 2 4 11 51 100 ▇▁▃▂▁
category 0 1 0.00 0.00 0 0 0 0 0 ▁▁▇▁▁

Variable type: POSIXct

skim_variable n_missing complete_rate min max median n_unique
date 0 1 2019-09-12 2019-12-09 2019-10-26 89

Lets try to plot the hits variable over time (date) using ggplot2.

library(ggplot2)
ggplot(last_christmas, aes(date, hits)) +
  geom_point()

suppose we want to apply a rolling average to the hits variable. For this we will use the slide() function, you pass the object you want to iterate over and what to do it. Similar style to how we do it in purrr, notice how I specified _dbl() as I know the answer should be numeric.

library(slide)
slide_dbl(last_christmas$hits, ~mean(.x))
##  [1]   4   6   6   4   4   4   4   3   3   3   3   4   4   5   5   5   4   4   3
## [20]   2   3   3   4   4   5   3   4   3   3   4   5   5   5   5   7   6   6   8
## [39]   8   7   8   9   9  10  12  14  11  17  17  16  19  22  27  27  31  43  55
## [58]  83 100  87  62  51  44  51  64  89  82  47  45  44  46  56  73  64  43  45
## [77]  49  48  60  77  72  52  52  54  53  68  79  75  53

However, nothing actually happened in this example as the sliding window had length 1. Instead let’s calculate the average of the last 5 days, or I other words: today + last 4 days. We do this by specifying .before = 4.

slide_dbl(last_christmas$hits, mean, .before = 4)
##  [1]  4.000000  5.000000  5.333333  5.000000  4.800000  4.800000  4.400000
##  [8]  3.800000  3.600000  3.400000  3.200000  3.200000  3.400000  3.800000
## [15]  4.200000  4.600000  4.600000  4.600000  4.200000  3.600000  3.200000
## [22]  3.000000  3.000000  3.200000  3.800000  3.800000  4.000000  3.800000
## [29]  3.600000  3.400000  3.800000  4.000000  4.400000  4.800000  5.400000
## [36]  5.600000  5.800000  6.400000  7.000000  7.000000  7.400000  8.000000
## [43]  8.200000  8.600000  9.600000 10.800000 11.200000 12.800000 14.200000
## [50] 15.000000 16.000000 18.200000 20.200000 22.200000 25.200000 30.000000
## [57] 36.600000 47.800000 62.400000 73.600000 77.400000 76.600000 68.800000
## [64] 59.000000 54.400000 59.800000 66.000000 66.600000 65.400000 61.400000
## [71] 52.800000 47.600000 52.800000 56.600000 56.400000 56.200000 54.800000
## [78] 49.800000 49.000000 55.800000 61.200000 61.800000 62.600000 61.400000
## [85] 56.600000 55.800000 61.200000 65.800000 65.600000

Notice how the output has the same length as the input. This makes it easy to use inside other functions such as mutate().

Now let’s calculate the moving average again and plot it on top of our plot.

library(dplyr)
last_christmas %>%
  mutate(hits_ma5 = slide_dbl(hits, ~mean(.x), .before = 4)) %>%
  ggplot(aes(date, hits)) +
  geom_point() +
  geom_line(aes(y = hits_ma5), color = "firebrick")

Lastly, I would like to note that you can use arbitrary functions inside your slide(). Take a look at the additional resources for many more examples of what slide can do.

Additional resources