Working with Rows (Course Preview)
This page contains a preview of just one of the 21 modules offered in the Statistics Globe online course on “Data Manipulation in R Using dplyr & the tidyverse”. Take a look at the full course by clicking here.
In this section, you’ll learn how to effectively manage rows within your data sets. This part covers essential techniques for filtering rows based on specific conditions, sorting them in a particular order, and incorporating new rows into your tibbles. By mastering these skills, you can refine your data analysis and ensure that your data sets are organized according to your analytical requirements.
Video Lecture
Exercises
The following exercises are based on the CSV file “My Programming Languages”. Please download it here and import it into R. You might watch the lecture on Importing & Exporting Data Using dplyr & readr to get further instructions on how to import external data as well as some information on the content of this data set.
- Filter rows to show only data where
R
‘s popularity is greater than 1.0. - Sort the tibble in descending order based on
Python
‘s popularity. - Find the year where
Matlab
had its highest popularity. - Identify which language has the greatest increase in popularity from the first to the last date in the data set.
- Randomly sample 5% of rows and calculate the average popularity of
R
andPython
for this sample. Don’t forget to set a random seed before. - Add a new row predicting that by January 2024,
R
has become the dominant programming language, universally adopted by 100% of the programming community.
The exercises will be discussed in the LinkedIn chat group.
R Code of This Lecture
# install.packages("tidyverse") # Install tidyverse packages library("tidyverse") # Load tidyverse packages my_tib <- tibble(x1 = c(1, 3, 5, 3, 3, 2, 4, 2), # Define variables in tibble() x2 = 11:18, x3 = c("a", "b", "a", "c", "b", "b", "a", "c"), x4 = "x") my_tib # Print tibble tib_row_filter <- my_tib %>% # Select rows conditionally filter(x1 == 3) # Apply filter() function tib_row_filter # Print new tibble tib_row_filter2 <- my_tib %>% # Select rows conditionally filter(x1 == 3 | x3 != "a") # Multiple filter() conditions tib_row_filter2 # Print new tibble tib_row_slice <- my_tib %>% # Select rows based on index slice(c(4, 6, 7)) # Apply slice() function tib_row_slice # Print new tibble tib_row_head <- my_tib %>% # Select top rows slice_head(n = 4) # Apply slice_head() function tib_row_head # Print new tibble tib_row_tail <- my_tib %>% # Select bottom rows slice_tail(n = 4) # Apply slice_tail() function tib_row_tail # Print new tibble tib_row_min <- my_tib %>% # Rows with lowest values slice_min(x1, n = 3) # Apply slice_min() function tib_row_min # Print new tibble tib_row_max <- my_tib %>% # Rows with highest values slice_max(x1, n = 3) # Apply slice_max() function tib_row_max # Print new tibble set.seed(3532355) # Ensure reproducibility tib_row_sample <- my_tib %>% # Randomly sample n rows sample_n(3) # Apply sample_n() function tib_row_sample # Print new tibble tib_row_sample2 <- my_tib %>% # Sample percentage of rows sample_frac(0.5) # Apply sample_frac() function tib_row_sample2 # Print new tibble tib_row_arrange <- my_tib %>% # Reorder rows arrange(x1) # Apply arrange() function tib_row_arrange # Print new tibble tib_row_arrange2 <- my_tib %>% # Select rows arrange(desc(x1)) # arrange() & desc() functions tib_row_arrange2 # Print new tibble new_row <- tibble(x1 = 11, # Create tibble with new row x2 = 22, x3 = "aa", x4 = "bb") new_row # Print new row tib_row_add <- my_tib %>% # Add one row bind_rows(new_row) # Apply bind_rows() function tib_row_add # Print new tibble new_row2 <- tibble(x1 = c(11, 111), # Create tibble with new rows x2 = c(22, 222), x3 = c("aa", "aaa"), x4 = c("bb", "bbb")) new_row2 # Print new rows tib_row_add2 <- my_tib %>% # Add one row bind_rows(new_row2) # Apply bind_rows() function tib_row_add2 # Print new tibble tib_row_multi <- my_tib %>% # Apply multiple operations filter(x1 != 1) %>% # Filter rows slice_head(n = 5) %>% # Extract top rows arrange(x3) # Reorder alphabetically tib_row_multi # Print new tibble
.