Recent & Upcoming Talks

2020

Predictive modeling with text using tidy data principles

Invited workshop for R/Pharma Conference Have you ever encountered text data and suspected there was useful insight latent within it but felt frustrated about how to find that insight? Are you familiar with dplyr and ggplot2, and ready to learn how unstructured text data can be used for prediction within the tidyverse and tidymodels ecosystems?

palette2vec: A new way to explore color palettes

There are many palettes available in various R packages. Having a way to explore all of these palettes are already found within the https://github.com/EmilHvitfeldt/r-color-palettes repository and the {paletteer} package. This talk shows what happens when we take one step further into explorability.

Looking at Stop Words: Why You Shouldn't Blindly Trust Model Defaults

Removing stop words is a fairly common step in natural language processing, and NLP packages often supply a default list. However, most documentation and tutorials don’t explore the nuances of selecting an appropriate list.

themis: dealing with imbalanced data by using synthetic oversampling

Many classification tasks come with an unbalanced dataset. Examples range from disease prediction to fraud detection. Naively applying your model will lead to an ineffective predictor that only predicts the majority class.

Predictive modeling with text using tidy data principles

J. Silge and E. Hvitfeldt Have you ever encountered text data and suspected there was useful insight latent within it but felt frustrated about how to find that insight? Are you familiar with dplyr and ggplot2, and ready to learn how unstructured text data can be used for prediction within the tidyverse and tidymodels ecosystems?

Reproducible preprocessing with recipes

Working alone or with other people becomes increasing difficult with the increase of files and people. This seminar goes into detail why and how to use git in collaborative research. Material in this talk is heavely inspired by Excuse me, do you have a moment to talk about version control?

Git & Github

Working alone or with other people becomes increasing difficult with the increase of files and people. This seminar goes into detail why and how to use git in collaborative research. Material in this talk is heavely inspired by Excuse me, do you have a moment to talk about version control?

2019

Building a package that fits into an evolving ecosystem

With an ever-increasing amount of textual data is available to us, having a well-thought-out toolchain for modelling is crucial. tidymodels is a recent effort to create a modelling framework that shares the underlying design philosophy, grammar, and data structures of the tidyverse.

Data visualization with ggplot2

Hastly put together slides for Data-viz prep for Hackathon.

Text classification in tidymodels

Building R Packages

Building a R package can seem daunting with its many files and structure. This seminar will go through the different use cases for a R package, dos and don’ts and best practices.

Debugging and Profiling in R

Hitting an error or a speed-bump while working in R can be a frustration. This seminar will cover strategies and techniques for performing debugging and code profiling in R. We will look at some different ways to identify bugs, how to fix them and how to prevent them from coming back again.

Working with tidymodels

Tidymodels is a “meta-package” in the same way as tidyverse, but with a focus on modeling and statistical analysis. This talk will go through how to use tidymodels to do modeling in a tidy fashion.

Debugging and Profiling in R

This seminar contains various links and resources for use in R and Rstudio.

2018

Text Analysis in R

An ever-increasing amount of textual data is available us. I’ll talk you through a structured way to do exploratory data analysis(also called text mining) using tidytext to gain insight into the plain unstructured text.

Best Practices in R

This presentation will let you though a lot of different aspects of what you can do in R to make yourself happy, make sure that future you is happy, and avoid getting mad at past you.

Similarity measure in the space of color palettes

Related to my project of creating a catalog of all available color palettes in r https://github.com/EmilHvitfeldt/r-color-palettes and its associated r package https://CRAN.R-project.org/package=paletteer I wanted to expand the project to support a higher degree of explorability.