Text Classification with Tidymodels

This post was written with early versions of tidymodels packages. And in some ways have not aged perfectly. The general idea about this post is still valid, but if you wan’t more up to date code please refer to tidymodels.com.

Introduction

I have previously used this blog to talk about text classification a couple of times. tidymodels have since then seen quite a bit of progress. I did in addition get the textrecipes package on CRAN, which provides extra steps to recipes package from tidymodels.

Seeing the always wonderful post by Julia Silge on text classification with tidy data principles encouraged me to show how the same workflow also can be accomplished in tidymodels.

To give this post a little spice will we only be using stop words. Yes, you read that right, we will only keep stop words. Words you are often encouraged to exclude as they don’t provide much information. We will challenge that assumption in this post! To have a baseline for our stop word model will I be using the same data as Julia used in her post.

Data

The data we will be using is the text from Pride and Prejudice and text from The War of the Worlds. These texts can we get from Project Gutenberg using the gutenbergr package. Note that both works are in English1.

library(tidyverse)
## Warning: package 'tibble' was built under R version 3.6.2
library(gutenbergr)

titles <- c(
  "The War of the Worlds",
  "Pride and Prejudice"
)
books <- gutenberg_works(title %in% titles) %>%
  gutenberg_download(meta_fields = "title") %>%
  mutate(title = as.factor(title)) %>%
  select(-gutenberg_id)

books
## # A tibble: 19,504 x 2
##    text                                                      title              
##    <chr>                                                     <fct>              
##  1 "The War of the Worlds"                                   The War of the Wor…
##  2 ""                                                        The War of the Wor…
##  3 "by H. G. Wells [1898]"                                   The War of the Wor…
##  4 ""                                                        The War of the Wor…
##  5 ""                                                        The War of the Wor…
##  6 "     But who shall dwell in these worlds if they be"     The War of the Wor…
##  7 "     inhabited? .  .  .  Are we or they Lords of the"    The War of the Wor…
##  8 "     World? .  .  .  And how are all things made for ma… The War of the Wor…
##  9 "          KEPLER (quoted in The Anatomy of Melancholy)"  The War of the Wor…
## 10 ""                                                        The War of the Wor…
## # … with 19,494 more rows

(deviating from Julia, will we drop the gutenberg_id variable as it is redundant, remove the document variable as it isn’t needed in the tidymodels framework and set the title variable as a factor as it works better with tidymodels used later on.)

I’m going to quote Julia to explain the modeling problem we are facing;

We have the text data now, and let’s frame the kind of prediction problem we are going to work on. Imagine that we take each book and cut it up into lines, like strips of paper (✨ confetti ✨) with an individual line on each paper. Let’s train a model that can take an individual line and give us a probability that this book comes from Pride and Prejudice vs. from The War of the Worlds.

So that is fairly straight-forward task, we already have the data as we want in books. Before we go on lets investigate the class imbalance.

books %>%
  ggplot(aes(title)) +
  geom_bar() +
  theme_minimal() +
  labs(x = NULL,
       y = "Count",
       title = "Number of Strips in 'Pride and Prejudice' and 'The War of the Worlds'")

It is a little uneven, but we will carry on.

Stop words

Lets first have a talk about stop words. These are the words that are needed for the sentences to be structurally sound, but doesn’t add any information. however such a concept as “non-informational” is quite abstract and is bound to be highly domain specific. We will be using the English snowball stop word lists provided by the stopwords package (because that is what textrecipes naively uses).

library(stopwords)
stopwords(language = "en", source = "snowball") %>% sort()
##   [1] "a"          "about"      "above"      "after"      "again"     
##   [6] "against"    "all"        "am"         "an"         "and"       
##  [11] "any"        "are"        "aren't"     "as"         "at"        
##  [16] "be"         "because"    "been"       "before"     "being"     
##  [21] "below"      "between"    "both"       "but"        "by"        
##  [26] "can't"      "cannot"     "could"      "couldn't"   "did"       
##  [31] "didn't"     "do"         "does"       "doesn't"    "doing"     
##  [36] "don't"      "down"       "during"     "each"       "few"       
##  [41] "for"        "from"       "further"    "had"        "hadn't"    
##  [46] "has"        "hasn't"     "have"       "haven't"    "having"    
##  [51] "he"         "he'd"       "he'll"      "he's"       "her"       
##  [56] "here"       "here's"     "hers"       "herself"    "him"       
##  [61] "himself"    "his"        "how"        "how's"      "i"         
##  [66] "i'd"        "i'll"       "i'm"        "i've"       "if"        
##  [71] "in"         "into"       "is"         "isn't"      "it"        
##  [76] "it's"       "its"        "itself"     "let's"      "me"        
##  [81] "more"       "most"       "mustn't"    "my"         "myself"    
##  [86] "no"         "nor"        "not"        "of"         "off"       
##  [91] "on"         "once"       "only"       "or"         "other"     
##  [96] "ought"      "our"        "ours"       "ourselves"  "out"       
## [101] "over"       "own"        "same"       "shan't"     "she"       
## [106] "she'd"      "she'll"     "she's"      "should"     "shouldn't" 
## [111] "so"         "some"       "such"       "than"       "that"      
## [116] "that's"     "the"        "their"      "theirs"     "them"      
## [121] "themselves" "then"       "there"      "there's"    "these"     
## [126] "they"       "they'd"     "they'll"    "they're"    "they've"   
## [131] "this"       "those"      "through"    "to"         "too"       
## [136] "under"      "until"      "up"         "very"       "was"       
## [141] "wasn't"     "we"         "we'd"       "we'll"      "we're"     
## [146] "we've"      "were"       "weren't"    "what"       "what's"    
## [151] "when"       "when's"     "where"      "where's"    "which"     
## [156] "while"      "who"        "who's"      "whom"       "why"       
## [161] "why's"      "will"       "with"       "won't"      "would"     
## [166] "wouldn't"   "you"        "you'd"      "you'll"     "you're"    
## [171] "you've"     "your"       "yours"      "yourself"   "yourselves"

this list contains 175 words. Many of these words will at first glance pass the “non-informational” test. However if you look at it more you will realize that many of these can have meaning in certain contexts. The word “i” for example will be used more in blog posts then legal documents. Secondly there appear to be quite a lot of negation words, “wouldn’t”, “don’t”, “doesn’t” and “mustn’t” just to list a few. This is another reminder that constructing your own stop word list can be highly beneficial for your project as the default list might not work in your field.

While these words are assumed to have little information, the distribution of them and the relational information contained with how the stop word are used compared to each other might give us some information anyways. One author might use negations more often then another, maybe someon really like to use the word “nor”. These kind of features can be extracted as the distributional information, or in other words “counts”. We will count how often each stop word appear and hope that some of the words can divide the authors. Next we have the order of which words appear in. This is related to writing style, some authors might write “… will you please…” while others might use “… you will handle…”. The way each word combination is used might be worth a little bit of information. We will capture the relational information with ngrams.

We will briefly showcase how this works with an example.

sentence <- "This an example sentence that is used to explain the concept of ngrams."

to extract the ngrams we will use the tokenizers package (also default in textrecipes). Here we can get all the trigrams (ngrams of length 3).

library(tokenizers)
tokenize_ngrams(sentence, n = 3)
## [[1]]
##  [1] "this an example"       "an example sentence"   "example sentence that"
##  [4] "sentence that is"      "that is used"          "is used to"           
##  [7] "used to explain"       "to explain the"        "explain the concept"  
## [10] "the concept of"        "concept of ngrams"

however we would also like to the singular word counts (unigrams) and bigrams (ngrams of length 2). This can easily be done by setting the n_min argument.

tokenize_ngrams(sentence, n = 3, n_min = 1)
## [[1]]
##  [1] "this"                  "this an"               "this an example"      
##  [4] "an"                    "an example"            "an example sentence"  
##  [7] "example"               "example sentence"      "example sentence that"
## [10] "sentence"              "sentence that"         "sentence that is"     
## [13] "that"                  "that is"               "that is used"         
## [16] "is"                    "is used"               "is used to"           
## [19] "used"                  "used to"               "used to explain"      
## [22] "to"                    "to explain"            "to explain the"       
## [25] "explain"               "explain the"           "explain the concept"  
## [28] "the"                   "the concept"           "the concept of"       
## [31] "concept"               "concept of"            "concept of ngrams"    
## [34] "of"                    "of ngrams"             "ngrams"

Now we get unigrams, bigrams and trigrams in one. But wait, we wanted to limit our focus to stop words. Here is how the end result will look once we exclude all non-stop words and perform the ngram operation.

tokenize_words(sentence) %>%
  unlist() %>%
  intersect(stopwords(language = "en", source = "snowball")) %>%
  paste(collapse = " ") %>%
  print() %>%
  tokenize_ngrams(n = 3, n_min = 1)
## [1] "this an that is to the of"
## [[1]]
##  [1] "this"         "this an"      "this an that" "an"           "an that"     
##  [6] "an that is"   "that"         "that is"      "that is to"   "is"          
## [11] "is to"        "is to the"    "to"           "to the"       "to the of"   
## [16] "the"          "the of"       "of"

We have quite a reduction in ngrams then the full sentence, but hopefully there is some information within.

Training & testing split

Before we start modeling we need to split our data into a testing and training set. This is easily done using the rsample package from tidymodels.

library(tidymodels)
## Warning: package 'rsample' was built under R version 3.6.2
set.seed(1234) 

books_split <- initial_split(books, strata = "title", p = 0.75)
train_data <- training(books_split)
test_data <- testing(books_split)

Preprocessing

Next step is to do the preprocessing. For this will we use the recipes from tidymodels. This allows us to specify a preprocessing design that can be train on the training data and applied to the training and testing data alike. I created textrecipes as recipes doesn’t naively support text preprocessing.

I’m are going to replicate Julia’s preprocessing here to make comparisons easier for myself. Notice the step_filter() call, the original text have quite a lot of empty lines and these don’t contain any textual information at all so we will filter away these observations. Note also that we could have used all_predictors() instead of text at it is the only predictor we have.

library(textrecipes)
julia_rec <- recipe(title ~ ., data = train_data) %>%
  step_filter(text != "") %>%
  step_tokenize(text) %>%
  step_tokenfilter(text, min_times = 11) %>%
  step_tf(text) %>%
  prep(training = train_data)
julia_rec
## Data Recipe
## 
## Inputs:
## 
##       role #variables
##    outcome          1
##  predictor          1
## 
## Training data contained 14629 data points and no missing data.
## 
## Operations:
## 
## Row filtering [trained]
## Tokenization for text [trained]
## Text filtering for text [trained]
## Term frequency with text [trained]

This recipe will remove empty texts, tokenize to words (default in step_tokenize()), keeping words that appear 10 times or more in the training set and then count how many times each word appears. The processed data looks like this

julia_train_data <- juice(julia_rec)
julia_test_data  <- bake(julia_rec, test_data)

str(julia_train_data, list.len = 10)
## tibble [12,138 × 101] (S3: tbl_df/tbl/data.frame)
##  $ title            : Factor w/ 2 levels "Pride and Prejudice",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ tf_text_a        : num [1:12138] 0 0 0 0 0 0 0 0 0 0 ...
##  $ tf_text_about    : num [1:12138] 0 0 0 0 0 0 0 0 0 0 ...
##  $ tf_text_after    : num [1:12138] 0 0 0 0 0 0 0 0 0 0 ...
##  $ tf_text_again    : num [1:12138] 0 0 0 0 0 0 0 0 0 0 ...
##  $ tf_text_all      : num [1:12138] 0 0 0 0 1 0 0 0 0 0 ...
##  $ tf_text_am       : num [1:12138] 0 0 0 0 0 0 0 0 0 0 ...
##  $ tf_text_an       : num [1:12138] 0 0 0 0 0 0 0 0 0 0 ...
##  $ tf_text_and      : num [1:12138] 0 0 0 0 1 0 0 0 0 0 ...
##  $ tf_text_any      : num [1:12138] 0 0 0 0 0 0 0 0 0 0 ...
##   [list output truncated]

The reason we get 101 features and Julia got 1652 is because she did her filtering on the full dataset where we only did the filtering on the training set and that Julia didn’t explicitly remove empty oberservations.

Back to stop words!! In this case we need a slightly more complicated recipe

stopword_rec <- recipe(title ~ ., data = train_data) %>%
  step_filter(text != "") %>%
  step_tokenize(text) %>%
  step_stopwords(text, keep = TRUE) %>%
  step_untokenize(text) %>%
  step_tokenize(text, token = "ngrams", options = list(n = 3, n_min = 1)) %>%
  step_tokenfilter(text, min_times = 10) %>%
  step_tf(text) %>%
  prep(training = train_data)
stopword_rec
## Data Recipe
## 
## Inputs:
## 
##       role #variables
##    outcome          1
##  predictor          1
## 
## Training data contained 14629 data points and no missing data.
## 
## Operations:
## 
## Row filtering [trained]
## Tokenization for text [trained]
## Stop word removal for text [trained]
## Untokenization for text [trained]
## Tokenization for text [trained]
## Text filtering for text [trained]
## Term frequency with text [trained]

First we tokenize to words, remove all non-stop words, untokenize (which is basically just paste() with a fancy name), tokenize to ngrams, remove ngrams that appear less then 10 times and lastly we count how often each ngram appear.

# Processed data
stopword_train_data <- juice(stopword_rec)
stopword_test_data  <- bake(stopword_rec, test_data)

str(stopword_train_data, list.len = 10)
## tibble [12,138 × 101] (S3: tbl_df/tbl/data.frame)
##  $ title             : Factor w/ 2 levels "Pride and Prejudice",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ tf_text_a         : num [1:12138] 0 0 0 0 0 0 0 0 0 0 ...
##  $ tf_text_a and     : num [1:12138] 0 0 0 0 0 0 0 0 0 0 ...
##  $ tf_text_a of      : num [1:12138] 0 0 0 0 0 0 0 0 0 0 ...
##  $ tf_text_about     : num [1:12138] 0 0 0 0 0 0 0 0 0 0 ...
##  $ tf_text_after     : num [1:12138] 0 0 0 0 0 0 0 0 0 0 ...
##  $ tf_text_again     : num [1:12138] 0 0 0 0 0 0 0 0 0 0 ...
##  $ tf_text_all       : num [1:12138] 0 0 0 0 1 0 0 0 0 0 ...
##  $ tf_text_am        : num [1:12138] 0 0 0 0 0 0 0 0 0 0 ...
##  $ tf_text_an        : num [1:12138] 0 0 0 0 0 0 0 0 0 0 ...
##   [list output truncated]

And we are left with 101 features.

Modeling

For modeling we will be using the parsnip package from tidymodels. First we start by defining a model specification. This defines the intent of our model, what we want to do, not what we want to do it on. Meaning we don’t include the data yet, just the kind of model, its hyperparameters and the engine (the package that will do the work). We will be be using glmnet package here so we will specify a logistic regression model

glmnet_model <- logistic_reg(mixture = 0, penalty = 0.1) %>%
  set_engine("glmnet")
glmnet_model
## Logistic Regression Model Specification (classification)
## 
## Main Arguments:
##   penalty = 0.1
##   mixture = 0
## 
## Computational engine: glmnet

Here we will fit the models using both our training data, first using the stop words, then using the simple would count approach.

stopword_model <- glmnet_model %>%
  fit(title ~ ., data = stopword_train_data)

julia_model <- glmnet_model %>%
  fit(title ~ ., data = julia_train_data)

This is the part of the workflow where one should do hyperparameter optimization and explore different models to find the best model for the task. For the interest of the length of this post will this step be excluded, possible to be explored in a future post 😉.

Evaluation

Now that we have fitted the data based on the training data we can evaluate based on the testing data set. Here we will use the parsnip functions predict_class() and predict_classprob() to give us the predicted class and predicted probabilities for the two models. Neatly collecting the whole thing in one tibble.

eval_tibble <- stopword_test_data %>%
  select(title) %>%
  mutate(
    class_stopword = parsnip:::predict_class(stopword_model, stopword_test_data),
    class_julia    = parsnip:::predict_class(julia_model, julia_test_data),
    prop_stopword  = parsnip:::predict_classprob(stopword_model, stopword_test_data) %>% pull(`The War of the Worlds`),
    prop_julia     = parsnip:::predict_classprob(julia_model, julia_test_data) %>% pull(`The War of the Worlds`)
  )

eval_tibble
## # A tibble: 4,027 x 5
##    title           class_stopword     class_julia       prop_stopword prop_julia
##    <fct>           <fct>              <fct>                     <dbl>      <dbl>
##  1 The War of the… Pride and Prejudi… The War of the W…         0.475      0.508
##  2 The War of the… Pride and Prejudi… Pride and Prejud…         0.498      0.388
##  3 The War of the… Pride and Prejudi… Pride and Prejud…         0.335      0.315
##  4 The War of the… The War of the Wo… The War of the W…         0.690      0.710
##  5 The War of the… The War of the Wo… The War of the W…         0.650      0.607
##  6 The War of the… Pride and Prejudi… Pride and Prejud…         0.241      0.264
##  7 The War of the… Pride and Prejudi… Pride and Prejud…         0.369      0.351
##  8 The War of the… Pride and Prejudi… The War of the W…         0.403      0.568
##  9 The War of the… The War of the Wo… The War of the W…         0.520      0.631
## 10 The War of the… The War of the Wo… The War of the W…         0.511      0.545
## # … with 4,017 more rows

Tidymodels includes the yardstick package which makes evaluation calculations much easier and tidy. It can allow us to calculate the accuracy by calling the accuracy() function

accuracy(eval_tibble, truth = title, estimate = class_stopword)
## # A tibble: 1 x 3
##   .metric  .estimator .estimate
##   <chr>    <chr>          <dbl>
## 1 accuracy binary         0.778
accuracy(eval_tibble, truth = title, estimate = class_julia)
## # A tibble: 1 x 3
##   .metric  .estimator .estimate
##   <chr>    <chr>          <dbl>
## 1 accuracy binary         0.801

And we see that the stop words model beats the naive model (one that always picks the majority class), while lacking behind the word count model.

test_data %>%
  filter(text != "") %>%
  summarise(mean(title == "Pride and Prejudice"))
## # A tibble: 1 x 1
##   `mean(title == "Pride and Prejudice")`
##                                    <dbl>
## 1                                  0.662

We are also able to plot the ROC curve using roc_curve()(notice how we are using the predicted probabilities instead of class) and autoplot()

eval_tibble %>%
  roc_curve(title, prop_stopword) %>%
  autoplot()

To superimpose both ROC curve we are going to tidyr our data a little bit.

eval_tibble %>%
  rename(`Word Count` = prop_julia, `Stopwords` = prop_stopword) %>%
  gather("Stopwords", "Word Count", key = "Model", value = "Prop") %>%
  group_by(Model) %>%
  roc_curve(title, Prop) %>%
  autoplot() +
  labs(title = "ROC curve for text classification using word count or stopwords",
       subtitle = "Predicting whether text was written by Jane Austen or H.G. Wells") +
  paletteer::scale_color_paletteer_d("ggsci::category10_d3")

Conclusion

I’m not going to tell you that you should run a “all stop words only” model every-time you want to do text classification. But I hope this exercise shows you that stop words which are assumed to have no information does indeed have some degree on information. Please always look at your stop word list, check if you even need to remove them, some studies shows that removal of stop words might not provide the benefit you thought.

Furthermore I hope to have showed the power of tidymodels. Tidymodels is still growing, so if you have any feedback/bug reports/suggests please go to the respective repositories, we would highly appreciate it!

Comments

This plot was suggested in the comments, Thanks Isaiah!

stopword_model$fit %>% 
  tidy() %>%
  mutate(term = str_replace(term, "tf_text_", "")) %>%
  group_by(estimate > 0) %>%
  top_n(10, abs(estimate)) %>%
  ungroup() %>%
  ggplot(aes(fct_reorder(term, estimate), estimate, fill = estimate > 0)) +
  geom_col(alpha = 0.8, show.legend = FALSE) +
  coord_flip() +
  theme_minimal() +
  labs(x = NULL,
  title = "Coefficients that increase/decrease probability the most",
  subtitle = "Stopwords only")

And Isaiah notes that

Whereas Julia’s analysis using non stop words showed that Elizabeth is the opposite of a Martian, stop words shows that Pride and Prejudice talks of men and women, and War of the Worlds makes declarations about existence.

Which I would like to say looks pretty spot on.

session information


─ Session info ───────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.6.0 (2019-04-26)
 os       macOS Mojave 10.14.6        
 system   x86_64, darwin15.6.0        
 ui       X11                         
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       America/Los_Angeles         
 date     2020-04-23                  

─ Packages ───────────────────────────────────────────────────────────────────
 ! package       * version    date       lib source                            
 P assertthat      0.2.1      2019-03-21 [?] CRAN (R 3.6.0)                    
 P backports       1.1.6      2020-04-05 [?] CRAN (R 3.6.0)                    
 P base64enc       0.1-3      2015-07-28 [?] CRAN (R 3.6.0)                    
 P bayesplot       1.7.1      2019-12-01 [?] CRAN (R 3.6.0)                    
 P blogdown        0.18       2020-03-04 [?] CRAN (R 3.6.0)                    
 P bookdown        0.18       2020-03-05 [?] CRAN (R 3.6.0)                    
 P boot            1.3-24     2019-12-20 [?] CRAN (R 3.6.0)                    
 P broom         * 0.5.5      2020-02-29 [?] CRAN (R 3.6.0)                    
 P callr           3.4.3      2020-03-28 [?] CRAN (R 3.6.2)                    
 P cellranger      1.1.0      2016-07-27 [?] CRAN (R 3.6.0)                    
 P class           7.3-16     2020-03-25 [?] CRAN (R 3.6.0)                    
 P cli             2.0.2      2020-02-28 [?] CRAN (R 3.6.0)                    
 P clipr           0.7.0      2019-07-23 [?] CRAN (R 3.6.0)                    
 P codetools       0.2-16     2018-12-24 [?] CRAN (R 3.6.0)                    
 P colorspace      1.4-1      2019-03-18 [?] CRAN (R 3.6.0)                    
 P colourpicker    1.0        2017-09-27 [?] CRAN (R 3.6.0)                    
 P crayon          1.3.4      2017-09-16 [?] CRAN (R 3.6.0)                    
 P crosstalk       1.1.0.1    2020-03-13 [?] CRAN (R 3.6.0)                    
 P DBI             1.1.0      2019-12-15 [?] CRAN (R 3.6.0)                    
 P dbplyr          1.4.2      2019-06-17 [?] CRAN (R 3.6.0)                    
 P desc            1.2.0      2018-05-01 [?] CRAN (R 3.6.0)                    
 P details       * 0.2.1      2020-01-12 [?] CRAN (R 3.6.0)                    
 P dials         * 0.0.6      2020-04-03 [?] CRAN (R 3.6.0)                    
 P DiceDesign      1.8-1      2019-07-31 [?] CRAN (R 3.6.0)                    
 P digest          0.6.25     2020-02-23 [?] CRAN (R 3.6.0)                    
 P dplyr         * 0.8.5      2020-03-07 [?] CRAN (R 3.6.0)                    
 P DT              0.13       2020-03-23 [?] CRAN (R 3.6.0)                    
 P dygraphs        1.1.1.6    2018-07-11 [?] CRAN (R 3.6.0)                    
 P ellipsis        0.3.0      2019-09-20 [?] CRAN (R 3.6.0)                    
 P evaluate        0.14       2019-05-28 [?] CRAN (R 3.6.0)                    
 P fansi           0.4.1      2020-01-08 [?] CRAN (R 3.6.0)                    
 P fastmap         1.0.1      2019-10-08 [?] CRAN (R 3.6.0)                    
 P forcats       * 0.5.0      2020-03-01 [?] CRAN (R 3.6.0)                    
 P foreach         1.5.0      2020-03-30 [?] CRAN (R 3.6.2)                    
 P fs              1.4.1      2020-04-04 [?] CRAN (R 3.6.0)                    
 P furrr           0.1.0      2018-05-16 [?] CRAN (R 3.6.0)                    
 P future          1.16.0     2020-01-16 [?] CRAN (R 3.6.0)                    
 P generics        0.0.2      2018-11-29 [?] CRAN (R 3.6.0)                    
 P ggplot2       * 3.3.0      2020-03-05 [?] CRAN (R 3.6.0)                    
 P ggridges        0.5.2      2020-01-12 [?] CRAN (R 3.6.0)                    
 P globals         0.12.5     2019-12-07 [?] CRAN (R 3.6.0)                    
 P glue            1.4.0      2020-04-03 [?] CRAN (R 3.6.0)                    
 P gower           0.2.1      2019-05-14 [?] CRAN (R 3.6.0)                    
 P GPfit           1.0-8      2019-02-08 [?] CRAN (R 3.6.0)                    
 P gridExtra       2.3        2017-09-09 [?] CRAN (R 3.6.0)                    
 P gtable          0.3.0      2019-03-25 [?] CRAN (R 3.6.0)                    
 P gtools          3.8.2      2020-03-31 [?] CRAN (R 3.6.2)                    
 P gutenbergr    * 0.1.5      2019-09-10 [?] CRAN (R 3.6.0)                    
 P haven           2.2.0      2019-11-08 [?] CRAN (R 3.6.0)                    
 P hms             0.5.3      2020-01-08 [?] CRAN (R 3.6.0)                    
 P htmltools       0.4.0      2019-10-04 [?] CRAN (R 3.6.0)                    
 P htmlwidgets     1.5.1      2019-10-08 [?] CRAN (R 3.6.0)                    
 P httpuv          1.5.2      2019-09-11 [?] CRAN (R 3.6.0)                    
 P httr            1.4.1      2019-08-05 [?] CRAN (R 3.6.0)                    
 P igraph          1.2.5      2020-03-19 [?] CRAN (R 3.6.0)                    
 P infer         * 0.5.1      2019-11-19 [?] CRAN (R 3.6.0)                    
 P inline          0.3.15     2018-05-18 [?] CRAN (R 3.6.0)                    
 P ipred           0.9-9      2019-04-28 [?] CRAN (R 3.6.0)                    
 P iterators       1.0.12     2019-07-26 [?] CRAN (R 3.6.0)                    
 P janeaustenr     0.1.5      2017-06-10 [?] CRAN (R 3.6.0)                    
 P jsonlite        1.6.1      2020-02-02 [?] CRAN (R 3.6.0)                    
 P knitr         * 1.28       2020-02-06 [?] CRAN (R 3.6.0)                    
 P later           1.0.0      2019-10-04 [?] CRAN (R 3.6.0)                    
 P lattice         0.20-41    2020-04-02 [?] CRAN (R 3.6.0)                    
 P lava            1.6.7      2020-03-05 [?] CRAN (R 3.6.0)                    
 P lhs             1.0.1      2019-02-03 [?] CRAN (R 3.6.0)                    
 P lifecycle       0.2.0      2020-03-06 [?] CRAN (R 3.6.0)                    
 P listenv         0.8.0      2019-12-05 [?] CRAN (R 3.6.0)                    
 P lme4            1.1-23     2020-04-07 [?] CRAN (R 3.6.0)                    
 P loo             2.2.0      2019-12-19 [?] CRAN (R 3.6.0)                    
 P lubridate       1.7.8      2020-04-06 [?] CRAN (R 3.6.0)                    
 P magrittr        1.5        2014-11-22 [?] CRAN (R 3.6.0)                    
 P markdown        1.1        2019-08-07 [?] CRAN (R 3.6.0)                    
 P MASS            7.3-51.5   2019-12-20 [?] CRAN (R 3.6.0)                    
 P Matrix          1.2-18     2019-11-27 [?] CRAN (R 3.6.0)                    
 P matrixStats     0.56.0     2020-03-13 [?] CRAN (R 3.6.0)                    
 P mime            0.9        2020-02-04 [?] CRAN (R 3.6.0)                    
 P miniUI          0.1.1.1    2018-05-18 [?] CRAN (R 3.6.0)                    
 P minqa           1.2.4      2014-10-09 [?] CRAN (R 3.6.0)                    
 P modelr          0.1.6      2020-02-22 [?] CRAN (R 3.6.0)                    
 P munsell         0.5.0      2018-06-12 [?] CRAN (R 3.6.0)                    
 P nlme            3.1-145    2020-03-04 [?] CRAN (R 3.6.0)                    
 P nloptr          1.2.2.1    2020-03-11 [?] CRAN (R 3.6.0)                    
 P nnet            7.3-13     2020-02-25 [?] CRAN (R 3.6.0)                    
 P parsnip       * 0.1.0.9001 2020-04-17 [?] local                             
 P pillar          1.4.3      2019-12-20 [?] CRAN (R 3.6.0)                    
 P pkgbuild        1.0.6      2019-10-09 [?] CRAN (R 3.6.0)                    
 P pkgconfig       2.0.3      2019-09-22 [?] CRAN (R 3.6.0)                    
 P plyr            1.8.6      2020-03-03 [?] CRAN (R 3.6.0)                    
 P png             0.1-7      2013-12-03 [?] CRAN (R 3.6.0)                    
 P prettyunits     1.1.1      2020-01-24 [?] CRAN (R 3.6.0)                    
 P pROC            1.16.2     2020-03-19 [?] CRAN (R 3.6.0)                    
 P processx        3.4.2      2020-02-09 [?] CRAN (R 3.6.0)                    
 P prodlim         2019.11.13 2019-11-17 [?] CRAN (R 3.6.0)                    
 P promises        1.1.0      2019-10-04 [?] CRAN (R 3.6.0)                    
 P ps              1.3.2      2020-02-13 [?] CRAN (R 3.6.0)                    
 P purrr         * 0.3.3      2019-10-18 [?] CRAN (R 3.6.0)                    
 P R6              2.4.1      2019-11-12 [?] CRAN (R 3.6.0)                    
 P Rcpp            1.0.4.6    2020-04-09 [?] CRAN (R 3.6.0)                    
 P readr         * 1.3.1      2018-12-21 [?] CRAN (R 3.6.0)                    
 P readxl          1.3.1      2019-03-13 [?] CRAN (R 3.6.0)                    
 P recipes       * 0.1.10     2020-03-18 [?] CRAN (R 3.6.0)                    
   renv            0.9.3      2020-02-10 [1] CRAN (R 3.6.0)                    
 P reprex          0.3.0      2019-05-16 [?] CRAN (R 3.6.0)                    
 P reshape2        1.4.4      2020-04-09 [?] CRAN (R 3.6.2)                    
 P rlang           0.4.5      2020-03-01 [?] CRAN (R 3.6.0)                    
 P rmarkdown       2.1        2020-01-20 [?] CRAN (R 3.6.0)                    
 P rpart           4.1-15     2019-04-12 [?] CRAN (R 3.6.0)                    
 P rprojroot       1.3-2      2018-01-03 [?] CRAN (R 3.6.0)                    
 P rsample       * 0.0.6      2020-03-31 [?] CRAN (R 3.6.2)                    
 P rsconnect       0.8.16     2019-12-13 [?] CRAN (R 3.6.0)                    
 P rstan           2.19.3     2020-02-11 [?] CRAN (R 3.6.0)                    
 P rstanarm        2.19.3     2020-02-11 [?] CRAN (R 3.6.0)                    
 P rstantools      2.0.0      2019-09-15 [?] CRAN (R 3.6.0)                    
 P rstudioapi      0.11       2020-02-07 [?] CRAN (R 3.6.0)                    
 P rvest           0.3.5      2019-11-08 [?] CRAN (R 3.6.0)                    
 P scales        * 1.1.0      2019-11-18 [?] CRAN (R 3.6.0)                    
 P sessioninfo     1.1.1      2018-11-05 [?] CRAN (R 3.6.0)                    
 P shiny           1.4.0.2    2020-03-13 [?] CRAN (R 3.6.0)                    
 P shinyjs         1.1        2020-01-13 [?] CRAN (R 3.6.0)                    
 P shinystan       2.5.0      2018-05-01 [?] CRAN (R 3.6.0)                    
 P shinythemes     1.1.2      2018-11-06 [?] CRAN (R 3.6.0)                    
 P SnowballC       0.7.0      2020-04-01 [?] CRAN (R 3.6.2)                    
 P StanHeaders     2.21.0-1   2020-01-19 [?] CRAN (R 3.6.0)                    
 P statmod         1.4.34     2020-02-17 [?] CRAN (R 3.6.0)                    
 P stopwords     * 1.0        2019-07-24 [?] CRAN (R 3.6.0)                    
 P stringi         1.4.6      2020-02-17 [?] CRAN (R 3.6.0)                    
 P stringr       * 1.4.0      2019-02-10 [?] CRAN (R 3.6.0)                    
 P survival        3.1-12     2020-03-28 [?] Github (therneau/survival@c55af18)
 P textrecipes   * 0.1.0.9000 2020-04-11 [?] local                             
 P threejs         0.3.3      2020-01-21 [?] CRAN (R 3.6.0)                    
 P tibble        * 3.0.1      2020-04-20 [?] CRAN (R 3.6.2)                    
 P tidymodels    * 0.1.0      2020-02-16 [?] CRAN (R 3.6.0)                    
 P tidyposterior   0.0.2      2018-11-15 [?] CRAN (R 3.6.0)                    
 P tidypredict     0.4.5      2020-02-10 [?] CRAN (R 3.6.0)                    
 P tidyr         * 1.0.2      2020-01-24 [?] CRAN (R 3.6.0)                    
 P tidyselect      1.0.0      2020-01-27 [?] CRAN (R 3.6.0)                    
 P tidytext        0.2.3      2020-03-04 [?] CRAN (R 3.6.0)                    
 P tidyverse     * 1.3.0      2019-11-21 [?] CRAN (R 3.6.0)                    
 P timeDate        3043.102   2018-02-21 [?] CRAN (R 3.6.0)                    
 P tokenizers    * 0.2.1      2018-03-29 [?] CRAN (R 3.6.0)                    
 P tune          * 0.1.0      2020-04-02 [?] CRAN (R 3.6.0)                    
 P vctrs           0.2.4      2020-03-10 [?] CRAN (R 3.6.0)                    
 P withr           2.1.2      2018-03-15 [?] CRAN (R 3.6.0)                    
 P workflows     * 0.1.1      2020-03-17 [?] CRAN (R 3.6.0)                    
 P xfun            0.13       2020-04-13 [?] CRAN (R 3.6.2)                    
 P xml2            1.3.0      2020-04-01 [?] CRAN (R 3.6.2)                    
 P xtable          1.8-4      2019-04-21 [?] CRAN (R 3.6.0)                    
 P xts             0.12-0     2020-01-19 [?] CRAN (R 3.6.0)                    
 P yaml            2.2.1      2020-02-01 [?] CRAN (R 3.6.0)                    
 P yardstick     * 0.0.6      2020-03-17 [?] CRAN (R 3.6.0)                    
 P zoo             1.8-7      2020-01-10 [?] CRAN (R 3.6.0)                    

[1] /Users/emilhvitfeldthansen/Desktop/blogv4/renv/library/R-3.6/x86_64-apple-darwin15.6.0
[2] /private/var/folders/m0/zmxymdmd7ps0q_tbhx0d_26w0000gn/T/RtmpoQwJ5C/renv-system-library

 P ── Loaded and on-disk path mismatch.


Emil Hvitfeldt
Emil Hvitfeldt
Research Programmer