Blog Posts

xaringancolor announcement

I’m happy to announce a small new package of mine: xaringancolor. xaringancolor allows you to specify shared text colors in the text, equations and code sections of xaringan slides.

Textrecipes Version 0.4.0

I’m happy to announce that version 0.4.0 of textrecipes got on CRAN a couple of days ago. This will be a brief post going over the major additions and changes.

Textrecipes series: Pretrained Word Embedding

This is the fifth blog post in the textrecipes series where I go over the various text preprocessing workflows you can do with textrecipes. This post will be showcasing how to use pretrained word embeddings.

Supervised Machine Learning for Text Analysis in R

I have been waiting a long time to finally be able to craft this blog post. Last Friday Julia Silge and I led a userR! 2020 online tutorial on “predictive modeling with text using tidy data principles”.

Textrecipes series: Feature Hashing

This is the fourth blog post in the textrecipes series where I go over the various text preprocessing workflows you can do with textrecipes. This post will be showcasing how to perform feature hashing) (also known as the hashing trick).

Textrecipes series: TF-IDF

This is the third blog post in the textrecipes series where I go over the various text preprocessing workflows you can do with textrecipes. This post will be showcasing how to perform term frequency-inverse document frequency (Tf-IDF for short).

Textrecipes series: lexicons

This is the second blog post in the textrecipes series where I go over the various text preprocessing workflows you can do with textrecipes. This post will be covering how to use lexicons to create features.

Textrecipes series: Term Frequency

This is the first blog post in a series I am starting to go over the various text preprocessing workflows you can do with textrecipes. In this post will we start simple with term frequencies.

tidytuesday: Part-of-Speech and textrecipes with The Office

This post was written before the change to textrecipes to support spacyr as an engine to step_tokenize(). It is still a good demonstration of how to use a custom tokenizer.

Word Rank Slope Charts

I have been working on visualizing how different kinds of words are used in texts and I finally found a good visualization style with the slope chart. More specifically I’m thinking of two groups of paired words.