Remove Rows with NA in R Data Frame (6 Examples) | Some or All Missing

 

In this article you’ll learn how to remove rows containing missing values in the R programming language.

The article consists of six examples for the removal of NA values. To be more precise, the content of the tutorial is structured like this:

Let’s just jump right in.

 

Example Data

The following data will be used as basement for this R programming tutorial:

data <- data.frame(x1 = c(4, 1, NA, 7, 8, 1),    # Create example data
                   x2 = c("A", NA, NA, "XX", "YO", "YA"),
                   x3 = c(1, 0, NA, 1, 1, NA))
data                                             # Print example data
#   x1   x2 x3
# 1  4    A  1
# 2  1 <NA>  0
# 3 NA <NA> NA
# 4  7   XX  1
# 5  8   YO  1
# 6  1   YA NA

As you can see based on the previous output of the RStudio console, our example data frame consists of six rows and three columns. Each of the variables contains at least one NA values (i.e. missing data). The third row is missing in each of the three variables.

 

Example 1: Removing Rows with Some NAs Using na.omit() Function

Example 1 illustrates how to use the na.omit function to create a data set without missing values. For this, we simply have to insert the name of our data frame (i.e. data) inside of the na.omit function:

data1 <- na.omit(data)                           # Apply na.omit function
data1                                            # Printing updated data
#   x1 x2 x3
# 1  4  A  1
# 4  7 XX  1
# 5  8 YO  1

Have a look at the output of the RStudio console: Our updated data frame consists of three columns. None of these columns contains NA values.

 

Example 2: Removing Rows with Some NAs Using complete.cases() Function

The R programming language provides many different alternatives for the deletion of missing data in data frames. In Example 2, I’ll illustrate how to use the complete.cases function for this task:

data2 <- data[complete.cases(data), ]            # Apply complete.cases function
data2                                            # Printing updated data
#   x1 x2 x3
# 1  4  A  1
# 4  7 XX  1
# 5  8 YO  1

The output is exactly the same as in Example 1. However, this time we have used the complete.cases function instead of the na.omit function.

Note that the complete.cases function has its name because it creates a complete data set without any missing values (sounds logical, doesn’t it?). The application of the complete.cases command is therefore sometimes called “listwise deletion“.

 

Example 3: Removing Rows with Some NAs Using rowSums() & is.na() Functions

In this Section, I’ll illustrate how to use a combination of the rowSums and is.na functions to create a complete data frame.

data3 <- data[rowSums(is.na(data)) == 0, ]       # Apply rowSums & is.na
data3                                            # Printing updated data
#   x1 x2 x3
# 1  4  A  1
# 4  7 XX  1
# 5  8 YO  1

The output is the same as in the previous examples. However, this R code can easily be modified to retain rows with a certain amount of NAs. For instance, if you want to remove all rows with 2 or more missing values, you can replace “== 0” by “>= 2”.

 

Example 4: Removing Rows with Some NAs Using drop_na() Function of tidyr Package

If you prefer the tidyverse instead of the functions provided by the basic installation of the R programming language, this example may be interesting for you.

In this Example, I’ll illustrate how to apply the drop_na function of the tidyr package to delete rows containing NAs.

We first need to install and load the tidyr package:

install.packages("tidyr")                        # Install & load tidyr package
library("tidyr")

Now, we can use the drop_na function to drop missing rows as shown below:

data4 <- data %>% drop_na()                      # Apply drop_na function
data4                                            # Printing updated data
#   x1 x2 x3
# 1  4  A  1
# 4  7 XX  1
# 5  8 YO  1

Again, the output is the same as in the previous examples.

 

Example 5: Removing Rows with Only NAs Using is.na(), rowSums() & ncol() Functions

So far, we have removed data lines that contain at least one missing value. In Example 5, I’ll show how to remove only rows were all data cells are NA. For this, I’m using the is.na function again (as in Example 3):

data5 <- data[rowSums(is.na(data)) != ncol(data), ] # Apply is.na function
data5                                            # Printing updated data
#   x1   x2 x3
# 1  4    A  1
# 2  1 <NA>  0
# 4  7   XX  1
# 5  8   YO  1
# 6  1   YA NA

In this example, only the third row was deleted. Rows 2 and 6 were kept, since they do also contain non-NA values.

 

Example 6: Removing Rows with Only NAs Using filter() Function of dplyr Package

If we want to drop only rows were all values are missing, we can also use the dplyr package of the tidyverse.

If we want to use the functions of the dplyr package, we first need to install and load dplyr:

install.packages("dplyr")                        # Install dplyr package
library("dplyr")                                 # Load dplyr package

Now, we can use a combination of the filter function of the dplyr package and the is.na function of Base R:

data6 <- filter(data, rowSums(is.na(data)) != ncol(data)) # Apply filter function
data6                                            # Printing updated data
#   x1   x2 x3
# 1  4    A  1
# 2  1 <NA>  0
# 3  7   XX  1
# 4  8   YO  1
# 5  1   YA NA

Note that the previous R code renamed the row names of our data ranging from 1 to the number of rows of the updated data set.

 

Video & Further Resources

Would you like to learn more about removing missing data? Then you could watch the following video of my YouTube channel. I’m explaining the examples of the present tutorial in the video:

 

 

Besides that, you may read the related tutorials of my website:

 

This article explained how to extract NA rows of a data frame in R. Please let me know in the comments, if you have additional questions.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


6 Comments. Leave new

  • Hi!
    Regarding example 3: I used this command:
    data3 = 2, ]
    And as far as I can see, data3 then contains only rows, which have two or more missings instead of those being removed.

    If you do this:
    data3 = 2, ]
    data3 should consist only of rows which have two or more variables with regular data, which means 5 rows in this case and one missing per row max because it is 3 rows… Am I right?

    Reply
    • Hey Marvin,

      I’m not sure if I get your problem correctly. So you want to keep each row with at least 2 valid values?

      Regards

      Joachim

      Reply
  • somehow, it is not showing the code line correctly:

    1. data3 = 2, ]
    2. data3 = 2, ]

    Reply
  • Ali Haider Mridha
    November 15, 2021 12:29 pm

    Excellent ! As a beginner, I found it sooooo helpful!
    Thanks Joachim.
    Keep going with such help materials.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top