Import, Manipulate & Visualize Data Science Salaries Using R & the tidyverse

I had the pleasure of publishing a guest video on the Data Professor YouTube channel by Chanin Nantasenamat. In this video, I explore how to import, manipulate, and visualize data in R using the tidyverse packages, including readr, stringr, dplyr, and ggplot2. The analysis focuses on a data set containing data science salaries in the USA.

Watch the Video

Check out the full video tutorial below:

Online Course: Data Manipulation in R Using dplyr & the tidyverse

This video also serves as a teaser for my online course, “Data Manipulation in R Using dplyr & the tidyverse.” If you would like to learn more about this topic, be sure to check out the course description page here:

More Info About the Online Course

Code Shown in the Video

Below, you can find the code demonstrated in the video. The data set used in the video can be downloaded here, or on the kaggle website (see data attribution below).

install.packages("tidyverse")                     # Install tidyverse packages
library("tidyverse")                              # Load tidyverse packages
 
my_path <- "C:/Users/Joachim Schork/Dropbox/Jock/Data Sets/"  # Specify directory path
 
my_data <- read_csv(str_c(my_path,                # Import CSV file
                          "Data-Science-Job_Listing.csv"))
 
my_data <- my_data %>%                            # Select certain columns
  select("Location",
         "Salary")
 
my_data <- my_data %>%                            # Rename columns
  rename(location = "Location",
         salary = "Salary")
 
my_data <- my_data %>%                            # Replace values by NA
  mutate(salary = if_else(str_detect(salary, "Per Hour"), NA, salary))
 
my_data <- my_data %>%                            # Convert range to mean
  mutate(salary = str_extract_all(salary, "\\d+(?:\\.\\d+)?") %>%
           map(~ as.numeric(.x)) %>%
           map_dbl(~ mean(.x)))
 
my_data <- my_data%>%                             # Remove NA rows
  drop_na(salary)
 
my_data <- my_data %>%                            # Extract states
  mutate(location = sub(".*, ", "", location))
 
my_data <- my_data %>%                            # Remove certain rows
  filter(location != "United States")
 
my_data %>%                                       # Draw ggplot2 density plot
  ggplot(aes(x = salary)) +
  geom_density()
 
my_data %>%                                       # Draw ggplot2 boxplot
  ggplot(aes(x = salary)) +
  geom_boxplot()
 
my_data %>%                                       # Grouped boxplot
  ggplot(aes(x = salary,
             fill = location)) +
  geom_boxplot()
 
my_data %>%                                       # Filter data by count
  group_by(location) %>%
  filter(n() >= 10) %>%
  ungroup() %>% 
  ggplot(aes(x = salary,
             fill = location)) +
  geom_boxplot()
 
my_data %>%                                       # Ordered boxplot
  group_by(location) %>%
  filter(n() >= 10) %>%
  ungroup() %>%
  mutate(location = fct_reorder(location, salary, .fun = median)) %>% 
  ggplot(aes(x = salary,
             fill = location)) +
  geom_boxplot()
 
my_data %>%                                       # Modify boxplot layout
  group_by(location) %>%
  filter(n() >= 10) %>%
  ungroup() %>%
  mutate(location = fct_reorder(location, salary, .fun = median)) %>% 
  ggplot(aes(x = salary,
             fill = location)) +
  geom_boxplot() +
  labs(title = "Data Science Salary by States in Thousands",
       x = NULL) +
  theme(legend.title = element_blank(),
        axis.text.y = element_blank())

Data Attribution

This module utilizes data obtained from kaggle.com. We acknowledge the contributors as the primary source of this data set, which significantly enhances the educational value of our course. For more detailed data and additional resources, please visit the corresponding page on the kaggle website.

Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe. I hate spam & you may opt out anytime: Privacy Policy.

Import, Manipulate & Visualize Data Science Salaries Using R & the tidyverse

Watch the Video

Online Course: Data Manipulation in R Using dplyr & the tidyverse

Code Shown in the Video

Data Attribution

Leave a Reply Cancel reply

Statistics Globe Newsletter

Related Tutorials

Extract Hours, Minutes & Seconds from Date & Time Object in R (Example)

Select Rows with Partial String Match in R (2 Examples)

Import, Manipulate & Visualize Data Science Salaries Using R & the tidyverse

Watch the Video

Online Course: Data Manipulation in R Using dplyr & the tidyverse

Code Shown in the Video

Data Attribution

Subscribe to the Statistics Globe Newsletter

Thank you!

Leave a Reply Cancel reply

Statistics Globe Newsletter

Thank you!

Related Tutorials

Extract Hours, Minutes & Seconds from Date & Time Object in R (Example)

Select Rows with Partial String Match in R (2 Examples)