• Training Manual for SDAL
  • Welcome
    • The Social and Decision Analytics Laboratory
      • Overview: Our Mission & Methods
      • Innovation: Our Research Focus Areas
      • Collaboration: Our Research Partnerships
    • Data Science for the Public Good Program
      • Problem
      • Methods
      • Impact
      • Program Sponsors: Who We Work With
      • Previous Projects
    • Lesson Materials
      • Software-Carpenty
      • Data-Carpentry
      • The Carpentries
      • DataCamp
      • R for Data Science
  • Setup
    • The Bash Shell
      • Windows
      • macOS
      • Linux
      • Chromebook
    • Git
      • Windows
      • macOS
      • Linux
      • Chromebook
    • Text Editor
      • Windows
      • macOS
      • Linux
    • R
      • Windows
      • macOS
      • Linux
      • Post installation settings
    • SQLite
      • Windows
      • macOS
      • Linux
    • SSH Keys
      • GitHub
      • Lightfoot
      • GitLab (devlab)
    • LaTeX
  • Syllabus
    • Lesson Breakdown
      • Lesson 1
      • Lesson 2
      • Lesson 3
      • Lesson 4
      • Lesson 5
      • Lesson 6
      • Lesson 7
      • Lesson 8
      • Lesson 9
      • Lesson 10
      • Lesson 11
  • I Presentations and Talks
  • DSPG Introduction
  • Ethics
  • II SDAL Specific Components
  • 1 Infrastructure
    • 1.1 Components
    • 1.2 Accessing Servers
      • 1.2.1 RStudio
      • 1.2.2 Database
      • 1.2.3 Python
    • 1.3 Project Template
    • 1.4 Getting a project repository
      • 1.4.1 Using Terminal
      • 1.4.2 Using RStudio
  • 2 Documentation and Metadata
    • 2.1 Code documentaion
      • 2.1.1 lintr
    • 2.2 Metadata
    • 2.3 README files
      • 2.3.1 Example File
  • III Version Control
  • 3 Version Control
    • 3.1 Git
    • 3.2 Git setup
    • 3.3 Git on your own
    • 3.4 Working with remotes
    • 3.5 Git with branches
    • 3.6 Collaborating with Git
    • 3.7 Protecting branches
    • 3.8 Help! (FAQ)
      • 3.8.1 General workflow
      • 3.8.2 Git push rejected (master)
      • 3.8.3 Accidently did work on master:
      • 3.8.4 Get changes from master on your branch
      • 3.8.5 Remote server (e.g., GitLab, GitHub, Bitbucket, etc) shows merge conflict
      • 3.8.6 Remove data/files from history
  • IV RStudio
  • 4 RStudio
    • 4.1 Restarting RStudio Session
      • 4.1.1 Within RStudio
      • 4.1.2 From the terminal
    • 4.2 Accessing folders
      • 4.2.1 Outside of Home
  • V R
  • 5 Data Visualization
    • 5.1 Loading ggplot2
    • 5.2 Creating a ggplot
    • 5.3 Aesthetic Mapings
    • 5.4 Facets
    • 5.5 Geometic Objects
    • 5.6 Statistical Transformations
    • 5.7 Position Adjustments
    • 5.8 Coordinate Systems
  • 6 Importing Data
    • 6.1 Loading datasets
    • 6.2 Writing to a files
  • 7 Data Transformations
    • 7.1 dplyr library
    • 7.2 Filter
    • 7.3 Comparison (operators)
    • 7.4 Logical operators
    • 7.5 Arrange
    • 7.6 Select
    • 7.7 Mutate
    • 7.8 Summarize (summarise)
    • 7.9 Groupby
  • 8 Tidy data
    • 8.1 tidyr
    • 8.2 Gather
    • 8.3 Primer to joins
    • 8.4 Spread
    • 8.5 Separate
  • 9 Relational Data in R
    • 9.1 Create simple datasets
    • 9.2 one-to-one
    • 9.3 many to one OR one to many
    • 9.4 many to many
    • 9.5 Multiple keys
    • 9.6 Different column names
  • 10 Strings
    • 10.1 Regular Expressions
      • 10.1.1 Intro
      • 10.1.2 Basic Web Scraping Example
      • 10.1.3 Lookaround Examples
  • 11 Factors
    • 11.1 forcats
    • 11.2 Sorting
    • 11.3 Counting
    • 11.4 Re-ording factors
    • 11.5 Modifying factor values
      • 11.5.1 Double check your work
  • 12 Dates and Times
    • 12.1 lubridate
    • 12.2 Making datetimes from data
    • 12.3 Non-standard date formatting
    • 12.4 strptime format variables
    • 12.5 Datetime arithmetic
  • 13 SQL Databases
    • 13.1 Databases in R
      • 13.1.1 CREATE REUSABLE CONNECTION TO DATABASE
      • 13.1.2 WRITE DATA TO DATABASE TABLES
    • 13.2 Joins
      • 13.2.1 JOIN TABLES IN SQL WITH DBI
      • 13.2.2 JOIN TABLES IN SQL IN SQL CHUNK
      • 13.2.3 JOIN TABLES IN SQL IN SQL CHUNK WITH OUTPUT VARIABLE
      • 13.2.4 JOIN TABLES IN SQL IN SQL CHUNK APPLY WHERE CLAUSE AND FUZZY SEARCH
      • 13.2.5 JOIN TABLES IN SQL IN SQL CHUNK APPLY WHERE CLAUSE WITH ‘AND’ AND FUZZY SEARCH
      • 13.2.6 JOIN TABLES IN SQL IN SQL CHUNK APPLY SELECT AND WHERE CLAUSE WITH ‘AND’ AND FUZZY SEARCH AND ORDER
      • 13.2.7 JOIN TABLES IN SQL IN SQL CHUNK APPLY SELECT AND WHERE CLAUSE AND FUZZY SEARCH AND ORDER AND GROUP BY WITH AGGREGATE FUNCTION
    • 13.3 dplyr and data.table
  • 14 Vectors
    • 14.1 vectors (c)
    • 14.2 Subsetting vectors
    • 14.3 Negative indicies drop by position
    • 14.4 Lists
    • 14.5 Named vectors
    • 14.6 Named lists
    • 14.7 vectorization / broadcasting
  • 15 Functions
    • 15.1 Writing Functions
      • 15.1.1 Fahrenheit to Kelvin
      • 15.1.2 Kelvin to Celsius
      • 15.1.3 Fahrenheit to Celsius
  • 16 Conditionals
    • 16.1 if statements
    • 16.2 If else statements
    • 16.3 Dealing with NA
  • 17 Iteration
    • 17.1 Broadcasting
    • 17.2 For loops
      • 17.2.1 Pre allocating in a loop
    • 17.3 purrr (map)
    • 17.4 Apply (in base R)
      • 17.4.1 lapply
      • 17.4.2 sapply
      • 17.4.3 mapply
      • 17.4.4 apply (2-dimensions)
  • 18 R Dialects
    • 18.1 data.frame
      • 18.1.1 Convert to dataframe
      • 18.1.2 Select columns
      • 18.1.3 Select column as dataframe
      • 18.1.4 Filtering rows
      • 18.1.5 Filter rows and select columns
      • 18.1.6 Works with dplyr
  • VI R Special Topics
  • 19 data.table
    • 19.1 Read into data.table
    • 19.2 Convert to data.table
    • 19.3 Working with data.table
      • 19.3.1 Select columns list() .()
      • 19.3.2 Functions on columns
      • 19.3.3 .N (counts)
      • 19.3.4 Using dplyr
  • 20 Spatial Data
  • 21 APIs and Web Scraping
    • 21.1 Tables from websites (wikipedia)
    • 21.2 Websites (rvest)
    • 21.3 Selenium
  • 22 Communication
  • 23 Shiny
  • 24 Text Mining
    • 24.1 One-token-per-row (unnest_tokens)
    • 24.2 Jane Austen
    • 24.3 Stop words and Frequencies
    • 24.4 gutenbergr package
    • 24.5 Sentiment Analytis
    • 24.6 tf-idf: Term frequency, inverse document frequency
    • 24.7 bind_tf_idf
    • 24.8 Relationships (n-grams)
    • 24.9 Stemming
  • VII Modeling
  • 25 Random Forests
    • 25.1 CART
    • 25.2 Bootstrapping
    • 25.3 Bagging
    • 25.4 Viewing Random Forests
  • VIII Exercises
  • IX Back Matter
  • Build Details
  • References
  • Published with bookdown

Training Manual for SDAL

5.1 Loading ggplot2

library(ggplot2)
# mpg dataset from the ggplot2 library
mpg
## # A tibble: 234 x 11
##    manufacturer model    displ  year   cyl trans   drv     cty   hwy fl   
##    <chr>        <chr>    <dbl> <int> <int> <chr>   <chr> <int> <int> <chr>
##  1 audi         a4         1.8  1999     4 auto(l… f        18    29 p    
##  2 audi         a4         1.8  1999     4 manual… f        21    29 p    
##  3 audi         a4         2    2008     4 manual… f        20    31 p    
##  4 audi         a4         2    2008     4 auto(a… f        21    30 p    
##  5 audi         a4         2.8  1999     6 auto(l… f        16    26 p    
##  6 audi         a4         2.8  1999     6 manual… f        18    26 p    
##  7 audi         a4         3.1  2008     6 auto(a… f        18    27 p    
##  8 audi         a4 quat…   1.8  1999     4 manual… 4        18    26 p    
##  9 audi         a4 quat…   1.8  1999     4 auto(l… 4        16    25 p    
## 10 audi         a4 quat…   2    2008     4 manual… 4        20    28 p    
## # ... with 224 more rows, and 1 more variable: class <chr>