Import, Manipulate & Visualize Data Science Salaries Using R & the tidyverse
I had the pleasure of publishing a guest video on the Data Professor YouTube channel by Chanin Nantasenamat. In this video, I explore how to import, manipulate, and visualize data in R using the tidyverse packages, including readr, stringr, dplyr, and ggplot2. The analysis focuses on a data set containing data science salaries in the USA.
Watch the Video
Check out the full video tutorial below:
Online Course: Data Manipulation in R Using dplyr & the tidyverse
This video also serves as a teaser for my online course, “Data Manipulation in R Using dplyr & the tidyverse.” If you would like to learn more about this topic, be sure to check out the course description page here:
Code Shown in the Video
Below, you can find the code demonstrated in the video. The data set used in the video can be downloaded here, or on the kaggle website (see data attribution below).
install.packages("tidyverse") # Install tidyverse packages library("tidyverse") # Load tidyverse packages my_path <- "C:/Users/Joachim Schork/Dropbox/Jock/Data Sets/" # Specify directory path my_data <- read_csv(str_c(my_path, # Import CSV file "Data-Science-Job_Listing.csv")) my_data <- my_data %>% # Select certain columns select("Location", "Salary") my_data <- my_data %>% # Rename columns rename(location = "Location", salary = "Salary") my_data <- my_data %>% # Replace values by NA mutate(salary = if_else(str_detect(salary, "Per Hour"), NA, salary)) my_data <- my_data %>% # Convert range to mean mutate(salary = str_extract_all(salary, "\\d+(?:\\.\\d+)?") %>% map(~ as.numeric(.x)) %>% map_dbl(~ mean(.x))) my_data <- my_data%>% # Remove NA rows drop_na(salary) my_data <- my_data %>% # Extract states mutate(location = sub(".*, ", "", location)) my_data <- my_data %>% # Remove certain rows filter(location != "United States") my_data %>% # Draw ggplot2 density plot ggplot(aes(x = salary)) + geom_density() my_data %>% # Draw ggplot2 boxplot ggplot(aes(x = salary)) + geom_boxplot() my_data %>% # Grouped boxplot ggplot(aes(x = salary, fill = location)) + geom_boxplot() my_data %>% # Filter data by count group_by(location) %>% filter(n() >= 10) %>% ungroup() %>% ggplot(aes(x = salary, fill = location)) + geom_boxplot() my_data %>% # Ordered boxplot group_by(location) %>% filter(n() >= 10) %>% ungroup() %>% mutate(location = fct_reorder(location, salary, .fun = median)) %>% ggplot(aes(x = salary, fill = location)) + geom_boxplot() my_data %>% # Modify boxplot layout group_by(location) %>% filter(n() >= 10) %>% ungroup() %>% mutate(location = fct_reorder(location, salary, .fun = median)) %>% ggplot(aes(x = salary, fill = location)) + geom_boxplot() + labs(title = "Data Science Salary by States in Thousands", x = NULL) + theme(legend.title = element_blank(), axis.text.y = element_blank())
Data Attribution
This module utilizes data obtained from kaggle.com. We acknowledge the contributors as the primary source of this data set, which significantly enhances the educational value of our course. For more detailed data and additional resources, please visit the corresponding page on the kaggle website.
Subscribe to the Statistics Globe Newsletter
Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.
Thank you!
Welcome to the Statistics Globe newsletter. From now on, I’ll send you regular emails about statistics, data science, AI, and programming with R and Python.
I’m Joachim Schork. On this website, I provide statistics tutorials as well as code in Python and R programming.
Statistics Globe Newsletter
Get regular updates on the latest tutorials, offers & news at Statistics Globe. I hate spam & you may opt out anytime: Privacy Policy.
Thank you!
Please check your email inbox and click the confirmation link to complete your subscription. If you don’t see the email within a few minutes, please also check your spam/junk folder.






