dplyr Package in R | Tutorial & Programming Examples
The dplyr R package provides many tools for the manipulation of data in R. The dplyr package is part of the tidyverse environment.
- Here you can find the documentation of the dplyr package.
- Here you can find the CRAN page of the dplyr package.
Examples for the dplyr Package
This section shows examples for some functions of the dplyr package. The examples are based on the following data frame:
data <- data.frame(x1 = 1:6, # Create example data x2 = c(1, 2, 2, 3, 1, 2), x3 = c("F", "B", "C", "E", "A", "D")) data # Print example data # x1 x2 x3 # 1 1 1 F # 2 2 2 B # 3 3 2 C # 4 4 3 E # 5 5 1 A # 6 6 2 D
If we want to apply the functions of dplyr, we need to install and load the dplyr package:
install.packages("dplyr") # Install dplyr package library("dplyr") # Load dplyr package
Now, we are set up and can move on to the examples!
Example 1: arrange Function
The arrange function orders data sets according to a certain column of our data. Let’s use the arrange function to sort our data according to the variable x3:
arrange(data, x3) # Apply arrange function # x1 x2 x3 # 1 5 1 A # 2 2 2 B # 3 3 2 C # 4 6 2 D # 5 4 3 E # 6 1 1 F
Example 2: filter Function
The filter function extracts rows of our data by a logical condition. The following R code creates a subset of our original data frame, in which only rows with the value 2 in the variable x2 are retained:
filter(data, x2 == 2) # Apply filter function # x1 x2 x3 # 1 2 2 B # 2 3 2 C # 3 6 2 D
Example 3: mutate Function
The mutate function transforms variables into new variables. With the following R syntax, we can create a new variable x4, which is containing the sums of each row of the variables x1 and x2:
mutate(data, x4 = x1 + x2) # Apply mutate function # x1 x2 x3 x4 # 1 1 1 F 2 # 2 2 2 B 4 # 3 3 2 C 5 # 4 4 3 E 7 # 5 5 1 A 6 # 6 6 2 D 8
Example 4: pull Function
The pull function extracts certain columns of our data frame and converts them into a vector. The following R code extracts the variable x2:
pull(data, x2) # Apply pull function # 1 2 2 3 1 2
Example 5: rename Function
The rename function changes the name of certain columns. In this example, we’ll change the name of the third column from x3 to new_name:
rename(data, new_name = x3) # Apply rename function # x1 x2 new_name # 1 1 1 F # 2 2 2 B # 3 3 2 C # 4 4 3 E # 5 5 1 A # 6 6 2 D
Example 6: sample_n Function
The sample_n function randomly samples N cases from our data frame. The following R syntax sample three rows of our original data without replacement. Note that we are setting a seed for reproducibility in the forefront:
set.seed(765) # Set seed for reproducibility sample_n(data, 3) # Apply sample_n function # x1 x2 x3 # 1 3 2 C # 2 4 3 E # 3 5 1 A
Example 7: select Function
The select function extracts certain columns from a data frame. The following R programming code creates a subset with the columns x2 and x3:
select(data, c(x2, x3)) # Apply select function # x2 x3 # 1 1 F # 2 2 B # 3 2 C # 4 3 E # 5 1 A # 6 2 D
Note that the previous R codes could also be applied to a tibble instead of a data frame. Furthermore, the pipe-operator (i.e. %>%) could be used. However, for simplicity I sticked to a basic R programming style.
Video: Introduction to the dplyr Package in R
The following video of the Statistics Globe YouTube channel explains some of the most important functions of the dplyr package:
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
Tutorials on the dplyr Package
You can find tutorials and examples for the dplyr package below.
Other Useful R Packages
In the following, you can find a list of other useful R packages.
4 Comments. Leave new
This is a great single-point resource. Made me confident with dplyr. Thanks
Thank you Shrinivas, great to hear that you like this page! 🙂
Nice…One of the powerful package..
Absolutely, it’s definitely worth taking a closer look at the dplyr package! 🙂