Working with Rows (Course Preview)

This page contains a preview of just one of the 21 modules offered in the Statistics Globe online course on “Data Manipulation in R Using dplyr & the tidyverse”. Take a look at the full course by clicking here.

In this section, you’ll learn how to effectively manage rows within your data sets. This part covers essential techniques for filtering rows based on specific conditions, sorting them in a particular order, and incorporating new rows into your tibbles. By mastering these skills, you can refine your data analysis and ensure that your data sets are organized according to your analytical requirements.

Video Lecture

Exercises

The following exercises are based on the CSV file “My Programming Languages”. Please download it here and import it into R. You might watch the lecture on Importing & Exporting Data Using dplyr & readr to get further instructions on how to import external data as well as some information on the content of this data set.

Filter rows to show only data where R‘s popularity is greater than 1.0.
Sort the tibble in descending order based on Python‘s popularity.
Find the year where Matlab had its highest popularity.
Identify which language has the greatest increase in popularity from the first to the last date in the data set.
Randomly sample 5% of rows and calculate the average popularity of R and Python for this sample. Don’t forget to set a random seed before.
Add a new row predicting that by January 2024, R has become the dominant programming language, universally adopted by 100% of the programming community.

The exercises will be discussed in the LinkedIn chat group.

R Code of This Lecture

# install.packages("tidyverse")                   # Install tidyverse packages
library("tidyverse")                              # Load tidyverse packages
 
my_tib <- tibble(x1 = c(1, 3, 5, 3, 3, 2, 4, 2),  # Define variables in tibble()
                 x2 = 11:18,
                 x3 = c("a", "b", "a", "c", "b", "b", "a", "c"),
                 x4 = "x")
my_tib                                            # Print tibble
 
tib_row_filter <- my_tib %>%                      # Select rows conditionally
  filter(x1 == 3)                                 # Apply filter() function
tib_row_filter                                    # Print new tibble
 
tib_row_filter2 <- my_tib %>%                     # Select rows conditionally
  filter(x1 == 3 | x3 != "a")                     # Multiple filter() conditions
tib_row_filter2                                   # Print new tibble
 
tib_row_slice <- my_tib %>%                       # Select rows based on index
  slice(c(4, 6, 7))                               # Apply slice() function
tib_row_slice                                     # Print new tibble
 
tib_row_head <- my_tib %>%                        # Select top rows
  slice_head(n = 4)                               # Apply slice_head() function
tib_row_head                                      # Print new tibble
 
tib_row_tail <- my_tib %>%                        # Select bottom rows
  slice_tail(n = 4)                               # Apply slice_tail() function
tib_row_tail                                      # Print new tibble
 
tib_row_min <- my_tib %>%                         # Rows with lowest values
  slice_min(x1, n = 3)                            # Apply slice_min() function
tib_row_min                                       # Print new tibble
 
tib_row_max <- my_tib %>%                         # Rows with highest values
  slice_max(x1, n = 3)                            # Apply slice_max() function
tib_row_max                                       # Print new tibble
 
set.seed(3532355)                                 # Ensure reproducibility
 
tib_row_sample <- my_tib %>%                      # Randomly sample n rows
  sample_n(3)                                     # Apply sample_n() function
tib_row_sample                                    # Print new tibble
 
tib_row_sample2 <- my_tib %>%                     # Sample percentage of rows
  sample_frac(0.5)                                # Apply sample_frac() function
tib_row_sample2                                   # Print new tibble
 
tib_row_arrange <- my_tib %>%                     # Reorder rows
  arrange(x1)                                     # Apply arrange() function
tib_row_arrange                                   # Print new tibble
 
tib_row_arrange2 <- my_tib %>%                    # Select rows
  arrange(desc(x1))                               # arrange() & desc() functions
tib_row_arrange2                                  # Print new tibble
 
new_row <- tibble(x1 = 11,                        # Create tibble with new row
                  x2 = 22,
                  x3 = "aa",
                  x4 = "bb")
new_row                                           # Print new row
 
tib_row_add <- my_tib %>%                         # Add one row
  bind_rows(new_row)                              # Apply bind_rows() function
tib_row_add                                       # Print new tibble
 
new_row2 <- tibble(x1 = c(11, 111),               # Create tibble with new rows
                   x2 = c(22, 222),
                   x3 = c("aa", "aaa"),
                   x4 = c("bb", "bbb"))
new_row2                                          # Print new rows
 
tib_row_add2 <- my_tib %>%                        # Add one row
  bind_rows(new_row2)                             # Apply bind_rows() function
tib_row_add2                                      # Print new tibble
 
tib_row_multi <- my_tib %>%                       # Apply multiple operations
  filter(x1 != 1) %>%                             # Filter rows
  slice_head(n = 5) %>%                           # Extract top rows
  arrange(x3)                                     # Reorder alphabetically
tib_row_multi                                     # Print new tibble