Remove Rows with NA Using dplyr Package in R (3 Examples)

 

This article explains how to delete data frame rows containing missing values in R programming.

The content of the post is structured like this:

Let’s dive right in!

 

Example Data & Packages

Have a look at the following example data:

data <- data.frame(x1 = c(1, 2, NA, 4, 5, 6),    # Create example data
                   x2 = c("X", NA, "Y", "AA", "X", "Z"),
                   x3 = 4)
data                                             # Print example data
#   x1   x2 x3
# 1  1    X  4
# 2  2 <NA>  4
# 3 NA    Y  4
# 4  4   AA  4
# 5  5    X  4
# 6  6    Z  4

The previous output of the RStudio console shows that the example data contains six rows and three columns. The variables x1 and x2 both contain one missing value (i.e. NA).

In this tutorial, we’ll use functions provided by the dplyr package. If we want to use the functions that are included in the dplyr package, we have to install and load it first:

install.packages("dplyr")                        # Install dplyr package
library("dplyr")                                 # Load dplyr package

Now, we can jump into the examples…

 

Example 1: Remove Rows with NA Using na.omit() Function

This example explains how to delete rows with missing data using the na.omit function and the pipe operator provided by the dplyr package:

data %>%                                         # Apply na.omit
  na.omit
#   x1 x2 x3
# 1  1  X  4
# 4  4 AA  4
# 5  5  X  4
# 6  6  Z  4

As you can see, we have removed all data frame observations that contained a least one NA value. This method is also called listwise deletion or complete cases analysis.

 

Example 2: Remove Rows with NA Using filter() & complete.cases() Functions

Alternatively to the R code of Example 1, we can also use the filter and complete.cases functions to remove data frame rows with missing values.

Have a look at the following syntax:

data %>%                                         # Apply filter & complete.cases
  filter(complete.cases(.))
#   x1 x2 x3
# 1  1  X  4
# 4  4 AA  4
# 5  5  X  4
# 6  6  Z  4

The output is exactly the same as in Example 1.

 

Example 3: Remove Rows with NA in Specific Column Using filter() & is.na() Functions

It is also possible to omit observations that have a missing value in a certain data frame variable.

The following R syntax removes only rows with an NA value in the column x1 using the filter and is.na functions:

data %>%                                         # Apply filter & is.na
  filter(!is.na(x1))
#   x1   x2 x3
# 1  1    X  4
# 2  2 <NA>  4
# 3  4   AA  4
# 4  5    X  4
# 5  6    Z  4

 

Video & Further Resources

Some time ago I have published a video on my YouTube channel, which illustrates the topics of this article. You can find the video below.

 

The YouTube video will be added soon.

 

In addition, you may have a look at the related tutorials on my website:

 

Summary: You learned in this post how to extract rows with missings in the R programming language. If you have any additional questions, tell me about it in the comments below. Furthermore, please subscribe to my email newsletter in order to get updates on the newest tutorials.

 

Subscribe to my free statistics newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top