NA Omit in R | 3 Example Codes for na.omit (Data Frame, Vector & by Column)

 

Basic R Syntax:

na.omit(data)

 

The na.omit R function removes all incomplete cases of a data object (typically of a data frame, matrix or vector). The syntax above illustrates the basic programming code for na.omit in R.

In the following R tutorial, I will show you 3 examples how the na.omit R function can be used. Sounds good? Let’s dive right in…

 

Example 1: na.omit in R Data Frame

na.omit is usually applied to a whole data set. Let’s create a simple data frame, for the following example:

data <- data.frame(x1 = c(9, 6, NA, 9, 2, 5, NA),     # Column with 2 missing values
                   x2 = c(NA, 5, 2, 1, 5, 8, 0),      # Column with 1 missing values
                   x3 = c(1, 3, 5, 7, 9, 7, 5))       # Column without missing values
data                                                  # Print data to RStudio console

 

Example Data with NA na.omit in R

Table 1: Example Data Frame for the Application of NA Omit in R.

 

Now, let’s apply the na.omit command and see what happens:

data_omit <- na.omit(data)                            # Apply na.omit in R
data_omit                                             # Print data_omit to RStudio console

 

After Application of na omit R Function

Table 2: Example Data Frame after the Application of NA Omit in R.

 

Compare Table 1 and Table 2, i.e. the example data frame before and after the application of na.omit. As you can see, all rows with NA values where removed. This method is sometimes referred to as casewise or listwise deletion.

Note: The R programming code of na.omit is the same, no matter if the data set has the data type matrix, data.frame, or data.table. The previous code can therefore also be used for a matrix or a data.table.

 

Example 2: R Omit NA from Vector

It is also possible to omit NAs of a vector or a single column. To illustrate that, I’m going to use the first column of our previously created data frame X1:

data$x1                                               # Original data vector with NAs
# 9  6 NA  9  2  5 NA

The original column vector has two missing values. Let’s omit these NA values via the na.omit R function:

na.omit(data$x1)                                      # Vector without NAs
# 9 6 9 2 5
# attr(,"na.action")
# 3 7
# attr(,"class")
# "omit"

The first line of the output consists of all cases that are not NA. However, the output also consists of additional information such as the positions of the deleted values and the class. If you want to get rid of these attributes, you can simply use the is.numeric function:

as.numeric(na.omit(data$x1))                          # Vector without NAs & attributes
# 9 6 9 2 5

Looks good! Let’s move on to the next example…

 

Example 3: NA Omit by Column? na.omit vs. complete.cases vs. is.na

In practice, you will often only need the complete cases of some columns, but not of all columns. Unfortunately, the na.omit command is difficult to use for this task, since the function is designed to omit rows based on all columns of a data object.

However, other functions can easily be used to exclusively omit NA values of specific columns. Let’s assume that we exclusively want to NA omit by column X1 of our previously created example data frame.

A function that handles this task is the complete.cases function. First, we need to create a subset with all columns of which the NAs should be deleted…

data_subset <- data[ , c("x1")]                       # Create subset with important columns

…and then we can apply the complete cases function to exclude all rows of our original data based on this subset:

data_by_column <- data[complete.cases(data_subset), ] # Omit NAs by columns
data_by_column                                        # Print data_by_column to RStudio console

 

Omit NA Values by Columns

Table 3: Remove Rows by Columns via the complete.cases Function.

 

As you can see based on Table 3: All rows with a missing value in X1 are deleted; the row with a missing value in X2 is kept.

If you want to omit rows based on exactly one column, the is.na function works even quicker than complete.cases:

data_is.na <- data[!is.na(data$x1), ]                 # Omit NA by column via is.na
data_is.na                                            # Same result as with complete.cases

Same result as before with even less R code – perfect!

Note: The is.na function works only if you want to omit by one column. The complete.cases solution works for any amount of columns!

 

Video Tutorial and Further Research

In case you need further info on the examples of this tutorial, I recommend having a look at the following video on my YouTube channel. In the video instruction, I’m explaining the R code of this article.

 

 

For further comparisons of the different R functions to handle NA values, have a look at the following video tutorial of my YouTube channel.

 

 

Further Reading

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


20 Comments. Leave new

  • what if the rows contain anything other than NA. What if it is “Not Available” . How do we deal with that type of data

    Reply
    • Hi Koorse,

      in such a case you have two possibilities.

      1) Clean your data before applying na.omit:

      x[x == "Not Available"] <- NA

      2) Omit cases that have a certain value in x:

      x[x != "Not Available"]
      Reply
  • Referencing Example 3, how to you select more than 1 column?

    Reply
    • Hi yellowrose,

      Thank you for your comment! If you want to select more than one column you would have to specify that in the subsetting process. For instance, if you want to remove all rows with missing values in x1 and/or x2, you could use the following code:

      data_subset <- data[ , c("x1", "x2)]                  # Create subset with important columns
      data_by_column <- data[complete.cases(data_subset), ] # Omit NAs by columns
      data_by_column                                        # Print data_by_column to RStudio console

      I hope that helps.

      Joachim

      Reply
  • I have replaced the missing data with the number “9” for reference, is there a way to just ignore the “9” instead?

    Reply
  • Library of this funtction?

    Reply
  • Raminder Singh
    June 4, 2021 10:02 am

    Does it works same as complete.cases?

    Reply
    • Hey Raminder,

      na.omit and complete.cases can produce the same output. However, the usage of the functions is slightly different.

      For example, the following 2 codes create the same output:

      na.omit(data)
      #   x1 x2 x3
      # 2  6  5  3
      # 4  9  1  7
      # 5  2  5  9
      # 6  5  8  7
      data[complete.cases(data), ]
      #   x1 x2 x3
      # 2  6  5  3
      # 4  9  1  7
      # 5  2  5  9
      # 6  5  8  7

      The reason for this is that the complete.cases function returns a logical indicator whether a row is complete or not. You can see that by applying only the complete.cases function as shown below:

      complete.cases(data)
      # [1] FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE

      I hope that helps!

      Joachim

      Reply
  • Say I have 10 columns and I want to remove all rows that have total NA > 3 in a row. How to do that? Thanks.

    Reply
  • Yao Jean KOUADIO
    October 6, 2021 2:02 pm

    Hello,
    I am new to r,
    I want to clean my missing data
    But none of these functions na.omit (), complete.cases () work, I don’t know where the problem is

    Reply
  • Hi Joachim,
    When I want to use na.omit() in a pipe before plotting a graph, my plot becomes completely gray. How do I solve this?

    Reply
  • what if the dataset contains “?” on a lot of rows. how do we clean it? or replace with NA?

    Reply
    • Hey,

      Please have a look at the following example:

      data <- data.frame(x1 = c(1, 2, "?", 2, "?"),
                         x2 = c("?", 3, 3, 3, 3))
       
      data[data == "?"] <- NA

      Regards,
      Joachim

      Reply
  • Hey question what about if you want to delete the NA if they appear in 3 colums but not the rest?

    example.. if imagine you have a dataset with columns a b and c, and you want to delete the NA ONLY when all 3 are NA so
    a. b. c
    1.- NA NA NA. <- to be deleted
    2.- 20. NA. NA <- not to be deleted
    3.- 15. NA NA <- not to be deleted

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top