Remove Rows with NA Using dplyr Package in R (3 Examples)
This article explains how to delete data frame rows containing missing values in R programming.
The content of the post is structured like this:
Let’s dive right in!
Example Data & Packages
Have a look at the following example data:
data <- data.frame(x1 = c(1, 2, NA, 4, 5, 6), # Create example data x2 = c("X", NA, "Y", "AA", "X", "Z"), x3 = 4) data # Print example data # x1 x2 x3 # 1 1 X 4 # 2 2 <NA> 4 # 3 NA Y 4 # 4 4 AA 4 # 5 5 X 4 # 6 6 Z 4
The previous output of the RStudio console shows that the example data contains six rows and three columns. The variables x1 and x2 both contain one missing value (i.e. NA).
In this tutorial, we’ll use functions provided by the dplyr package. If we want to use the functions that are included in the dplyr package, we have to install and load it first:
install.packages("dplyr") # Install dplyr package library("dplyr") # Load dplyr package
Now, we can jump into the examples…
Example 1: Remove Rows with NA Using na.omit() Function
This example explains how to delete rows with missing data using the na.omit function and the pipe operator provided by the dplyr package:
data %>% # Apply na.omit na.omit # x1 x2 x3 # 1 1 X 4 # 4 4 AA 4 # 5 5 X 4 # 6 6 Z 4
As you can see, we have removed all data frame observations that contained a least one NA value. This method is also called listwise deletion or complete cases analysis.
Example 2: Remove Rows with NA Using filter() & complete.cases() Functions
Alternatively to the R code of Example 1, we can also use the filter and complete.cases functions to remove data frame rows with missing values.
Have a look at the following syntax:
data %>% # Apply filter & complete.cases filter(complete.cases(.)) # x1 x2 x3 # 1 1 X 4 # 4 4 AA 4 # 5 5 X 4 # 6 6 Z 4
The output is exactly the same as in Example 1.
Example 3: Remove Rows with NA in Specific Column Using filter() & is.na() Functions
It is also possible to omit observations that have a missing value in a certain data frame variable.
The following R syntax removes only rows with an NA value in the column x1 using the filter and is.na functions:
data %>% # Apply filter & is.na filter(!is.na(x1)) # x1 x2 x3 # 1 1 X 4 # 2 2 <NA> 4 # 3 4 AA 4 # 4 5 X 4 # 5 6 Z 4
Video & Further Resources
Some time ago I have published a video on my YouTube channel, which illustrates the topics of this article. You can find the video below.
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
In addition, you may have a look at the related tutorials on my website:
- Conditionally Remove Row from Data Frame
- Remove Empty Rows of Data Frame in R
- Remove Rows with NA in Data Frame
- All R Programming Tutorials
Summary: You learned in this post how to extract rows with missings in the R programming language. If you have any additional questions, tell me about it in the comments below. Furthermore, please subscribe to my email newsletter in order to get updates on the newest tutorials.
Statistics Globe Newsletter