Import, Manipulate & Visualize Data Science Salaries Using R & the tidyverse

I had the pleasure of publishing a guest video on the Data Professor YouTube channel by Chanin Nantasenamat. In this video, I explore how to import, manipulate, and visualize data in R using the tidyverse packages, including readr, stringr, dplyr, and ggplot2. The analysis focuses on a data set containing data science salaries in the USA.

Watch the Video

Check out the full video tutorial below:

Online Course: Data Manipulation in R Using dplyr & the tidyverse

This video also serves as a teaser for my online course, “Data Manipulation in R Using dplyr & the tidyverse.” If you would like to learn more about this topic, be sure to check out the course description page here:

Code Shown in the Video

Below, you can find the code demonstrated in the video. The data set used in the video can be downloaded here, or on the kaggle website (see data attribution below).

install.packages("tidyverse")                     # Install tidyverse packages
library("tidyverse")                              # Load tidyverse packages
 
my_path <- "C:/Users/Joachim Schork/Dropbox/Jock/Data Sets/"  # Specify directory path
 
my_data <- read_csv(str_c(my_path,                # Import CSV file
                          "Data-Science-Job_Listing.csv"))
 
my_data <- my_data %>%                            # Select certain columns
  select("Location",
         "Salary")
 
my_data <- my_data %>%                            # Rename columns
  rename(location = "Location",
         salary = "Salary")
 
my_data <- my_data %>%                            # Replace values by NA
  mutate(salary = if_else(str_detect(salary, "Per Hour"), NA, salary))
 
my_data <- my_data %>%                            # Convert range to mean
  mutate(salary = str_extract_all(salary, "\\d+(?:\\.\\d+)?") %>%
           map(~ as.numeric(.x)) %>%
           map_dbl(~ mean(.x)))
 
my_data <- my_data%>%                             # Remove NA rows
  drop_na(salary)
 
my_data <- my_data %>%                            # Extract states
  mutate(location = sub(".*, ", "", location))
 
my_data <- my_data %>%                            # Remove certain rows
  filter(location != "United States")
 
my_data %>%                                       # Draw ggplot2 density plot
  ggplot(aes(x = salary)) +
  geom_density()
 
my_data %>%                                       # Draw ggplot2 boxplot
  ggplot(aes(x = salary)) +
  geom_boxplot()
 
my_data %>%                                       # Grouped boxplot
  ggplot(aes(x = salary,
             fill = location)) +
  geom_boxplot()
 
my_data %>%                                       # Filter data by count
  group_by(location) %>%
  filter(n() >= 10) %>%
  ungroup() %>% 
  ggplot(aes(x = salary,
             fill = location)) +
  geom_boxplot()
 
my_data %>%                                       # Ordered boxplot
  group_by(location) %>%
  filter(n() >= 10) %>%
  ungroup() %>%
  mutate(location = fct_reorder(location, salary, .fun = median)) %>% 
  ggplot(aes(x = salary,
             fill = location)) +
  geom_boxplot()
 
my_data %>%                                       # Modify boxplot layout
  group_by(location) %>%
  filter(n() >= 10) %>%
  ungroup() %>%
  mutate(location = fct_reorder(location, salary, .fun = median)) %>% 
  ggplot(aes(x = salary,
             fill = location)) +
  geom_boxplot() +
  labs(title = "Data Science Salary by States in Thousands",
       x = NULL) +
  theme(legend.title = element_blank(),
        axis.text.y = element_blank())

Data Attribution

This module utilizes data obtained from kaggle.com. We acknowledge the contributors as the primary source of this data set, which significantly enhances the educational value of our course. For more detailed data and additional resources, please visit the corresponding page on the kaggle website.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

The maximum upload file size: 2 MB. You can upload: image. Drop file here

Top