Day 7: gghalves

Welcome back for the 7th day of the #packagecalendar, today we will continue our look at the billboards data from yesterday. The package of the day is gghalves created by Frederik Tiedemann.

The package is available from CRAN and can be downloaded with

install.packages("gghalves")

we will be working with Santa’s elf dataset. The data includes 50 observations for 3 elves (150 observations total) taken monthly describing the elfly capacities. Units have been removed due to being classified.

library(skimr)
skim(elf)
Table 1: Data summary
Name elf
Number of rows 150
Number of columns 5
_______________________
Column type frequency:
factor 1
numeric 4
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
Elf 0 1 FALSE 3 Bud: 50, Twi: 50, Hol: 50

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Toy Making Intensity 0 1 5.84 0.83 4.3 5.1 5.80 6.4 7.9 ▆▇▇▅▂
Sugar Consumption 0 1 3.06 0.44 2.0 2.8 3.00 3.3 4.4 ▁▆▇▂▁
Jolliness 0 1 3.76 1.77 1.0 1.6 4.35 5.1 6.9 ▇▁▆▇▂
Reindeer Training 0 1 1.20 0.76 0.1 0.3 1.30 1.8 2.5 ▇▁▇▅▃

since we have a categorical variable (elf) and a couple of continuous variables we could use ggplot2 to visualize the distributions. Let us take a first look at the TMI (Toy Making Intensity) variable. We can create a boxplot for each elf by using geom_boxplot()

library(ggplot2)
ggplot(elf, aes(Elf, `Toy Making Intensity`)) +
  geom_boxplot()

But we can’t see the individual points. you could do a second plot with geom_dotplot()

ggplot(elf, aes(Elf, `Toy Making Intensity`)) +
  geom_dotplot(binaxis = "y", stackdir = "center", binwidth = 0.05)

But now we have two separate charts trying to show the same data. This is where gghalves comes in! gghalves allows you to split many of the aggregation geoms in half. It is easier to show with an example. We have the data from before and we want to showcase a boxplot and a dotplot at the same time.

library(gghalves)
ggplot(elf, aes(Elf, `Toy Making Intensity`)) +
  geom_half_boxplot() +
  geom_half_dotplot(binwidth = 0.05)

by using geom_half_boxplot() and geom_half_dotplot() we we able to elegently combine 2 plot types.

The geoms respect general ggplot2 elements so things like color

ggplot(elf, aes(Elf, `Reindeer Training`, color = Elf)) +
  geom_half_violin() +
  geom_half_point(alpha = 0.6)

and factors

ggplot(elf, aes(Elf, Jolliness, color = Elf)) +
  geom_half_violin() +
  geom_half_boxplot(side = "r") +
  facet_wrap(~ factor(`Sugar Consumption` > mean(`Sugar Consumption`), 
                      c(TRUE, FALSE), 
                      c("High Sugar Comsumption", "Low Sugar Comsumption")))

Creating elf dataset

library(tidyverse)
set.seed(1234)

elf <- iris %>%
  rename(Elf = Species,
         `Sugar Consumption` = Sepal.Width,
         `Toy Making Intensity` = Sepal.Length,
         `Jolliness` = Petal.Length,
         `Reindeer Training` = Petal.Width) %>%
  mutate(Elf = factor(as.numeric(Elf), labels = c("Buddy", "Twinkle", "Holly"))) %>%
  mutate_if(is.numeric, jitter)