Day 12: polite

Welcome back for the 12th day of the #packagecalendar, today we will be taking a look at the polite by Dmytro Perepolkin. The polite package helps you be considerate when scraping the web.

The package is available from CRAN and can be downloaded with

install.packages("polite")

The main functions for this package is bow() and scrape(). First to call bow() on the website you intend to scrape. This will set up a session with permission, set up a connection and establish settings.

library(polite)

session <- bow("https://www.classicfm.com/discover-music/occasions/christmas/nations-top-30-christmas-carols/")
## No encoding supplied: defaulting to UTF-8.
session
## <polite session> https://www.classicfm.com/discover-music/occasions/christmas/nations-top-30-christmas-carols/
##     User-agent: polite R package - https://github.com/dmi3kno/polite
##     robots.txt: 3 rules are defined for 2 bots
##    Crawl delay: 10 sec
##   The path is scrapable for this user-agent

We can then use scrape() to the HTML object that we can investigate with rvest.

library(rvest)
## Loading required package: xml2
scrape(session) %>%
  html_nodes("h2") %>%
  html_text()
##  [1] "On Air Now"                                        
##  [2] "Now Playing"                                       
##  [3] "1. O Holy Night"                                   
##  [4] "2. Silent Night"                                   
##  [5] " 3. In the Bleak Mid-Winter – Gustav Holst version"
##  [6] "4. In the Bleak Mid-Winter – Harold Darke version" 
##  [7] "5. Hark! The Herald Angels Sing"                   
##  [8] "6. O Come All Ye Faithful"                         
##  [9] "7. O Come, O Come Emmanuel"                        
## [10] "8. Coventry Carol"                                 
## [11] "9. O Little Town of Bethlehem"                     
## [12] "10. It Came Upon a Midnight Clear"                 
## [13] "11. Once in Royal David’s City"                    
## [14] "12. In Dulci Jubilo"                               
## [15] "13. Joy to the World"                              
## [16] "14. God Rest Ye Merry, Gentlemen"                  
## [17] "15. Away in a Manger"                              
## [18] "16. Sussex Carol"                                  
## [19] "17. Shepherd’s Pipe Carol"                         
## [20] "18. The Three Kings"                               
## [21] "19. Gabriel’s Message"                             
## [22] "20. Jesus Christ the Apple Tree"                   
## [23] "21. Gaudete"                                       
## [24] "22. The Holly and the Ivy"                         
## [25] "23. Carol of the Bells"                            
## [26] "24. See Amid the Winter's Snow"                    
## [27] "25. Ding Dong! Merrily on High"                    
## [28] "26. Candlelight Carol"                             
## [29] "27. Good King Wenceslas"                           
## [30] "28. Angels From the Realms of Glory"               
## [31] "29. The First Nowell"                              
## [32] "30. What Sweeter Music"                            
## [33] "Latest features"                                   
## [34] "More From ClassicFM"                               
## [35] "Browse by"

There are more examples of the package website of expended examples.

Additional resources