Day 9: gapminder

Welcome back for the 9th day of the #packagecalendar, today we will switch it up a little and talk about a great data package. The package of the day is gapminder created by Jenny Bryan.

The package is available from CRAN and can be downloaded with

install.packages("gapminder")

First a little background. The Gapminder Foundation was created by Ola Rosling, Anna Rosling Rönnlund and Hans Rosling to promote “sustainable global development and achievement of the United Nations Millennium Development Goals by increased use and understanding of statistics and other information about social, economic and environmental development at local, national and global levels.” (quote straight from Wikipedia)

The package includes a dataset with various socioeconomic factors. The data has been studied far and wide and serves as a great starting point for data wrangling and visualization. Both for your own #rstats journey but also if you are planning on teaching others. gapminder is a great starting dataset with a low barrier to entry for understanding, but a high ceiling for insights!

We can take a look at the dataset by loading the package and simply calling gapminder.

library(gapminder)
head(gapminder)
## # A tibble: 6 x 6
##   country     continent  year lifeExp      pop gdpPercap
##   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan Asia       1952    28.8  8425333      779.
## 2 Afghanistan Asia       1957    30.3  9240934      821.
## 3 Afghanistan Asia       1962    32.0 10267083      853.
## 4 Afghanistan Asia       1967    34.0 11537966      836.
## 5 Afghanistan Asia       1972    36.1 13079460      740.
## 6 Afghanistan Asia       1977    38.4 14880372      786.

We have information about life expectancy(lifeExp), population(pop) and Per-capita Gross domestic product(gdpPercap) for each country with a 5-year timestep. It is important to note that the package isn’t getting updated with newer data. So if you need these statistics for reasons other then teaching and examples people use other data sources.