NA Omit in R | 3 Example Codes for na.omit (Data Frame, Vector & by Column)

 

Basic R Syntax:

na.omit(data)

 

The na.omit R function removes all incomplete cases of a data object (typically of a data frame, matrix or vector). The syntax above illustrates the basic programming code for na.omit in R.

In the following R tutorial, I will show you 3 examples how the na.omit R function can be used. Sounds good? Let’s dive right in…

 

Example 1: na.omit in R Data Frame

na.omit is usually applied to a whole data set. Let’s create a simple data frame, for the following example:

data <- data.frame(x1 = c(9, 6, NA, 9, 2, 5, NA),     # Column with 2 missing values
                   x2 = c(NA, 5, 2, 1, 5, 8, 0),      # Column with 1 missing values
                   x3 = c(1, 3, 5, 7, 9, 7, 5))       # Column without missing values
data                                                  # Print data to RStudio console

 

Example Data with NA na.omit in R

Table 1: Example Data Frame for the Application of NA Omit in R.

 

Now, let’s apply the na.omit command and see what happens:

data_omit <- na.omit(data)                            # Apply na.omit in R
data_omit                                             # Print data_omit to RStudio console

 

After Application of na omit R Function

Table 2: Example Data Frame after the Application of NA Omit in R.

 

Compare Table 1 and Table 2, i.e. the example data frame before and after the application of na.omit. As you can see, all rows with NA values where removed. This method is sometimes referred to as casewise or listwise deletion.

Note: The R programming code of na.omit is the same, no matter if the data set has the data type matrix, data.frame, or data.table. The previous code can therefore also be used for a matrix or a data.table.

 

Example 2: R Omit NA from Vector

It is also possible to omit NAs of a vector or a single column. To illustrate that, I’m going to use the first column of our previously created data frame X1:

data$x1                                               # Original data vector with NAs
# 9  6 NA  9  2  5 NA

The original column vector has two missing values. Let’s omit these NA values via the na.omit R function:

na.omit(data$x1)                                      # Vector without NAs
# 9 6 9 2 5
# attr(,"na.action")
# 3 7
# attr(,"class")
# "omit"

The first line of the output consists of all cases that are not NA. However, the output also consists of additional information such as the positions of the deleted values and the class. If you want to get rid of these attributes, you can simply use the is.numeric function:

as.numeric(na.omit(data$x1))                          # Vector without NAs & attributes
# 9 6 9 2 5

Looks good! Let’s move on to the next example…

 

Example 3: NA Omit by Column? na.omit vs. complete.cases vs. is.na

In practice, you will often only need the complete cases of some columns, but not of all columns. Unfortunately, the na.omit command is difficult to use for this task, since the function is designed to omit rows based on all columns of a data object.

However, other functions can easily be used to exclusively omit NA values of specific columns. Let’s assume that we exclusively want to NA omit by column X1 of our previously created example data frame.

A function that handles this task is the complete.cases function. First, we need to create a subset with all columns of which the NAs should be deleted…

data_subset <- data[ , c("x1")]                       # Create subset with important columns

…and then we can apply the complete cases function to exclude all rows of our original data based on this subset:

data_by_column <- data[complete.cases(data_subset), ] # Omit NAs by columns
data_by_column                                        # Print data_by_column to RStudio console

 

Omit NA Values by Columns

Table 3: Remove Rows by Columns via the complete.cases Function.

 

As you can see based on Table 3: All rows with a missing value in X1 are deleted; the row with a missing value in X2 is kept.

If you want to omit rows based on exactly one column, the is.na function works even quicker than complete.cases:

data_is.na <- data[!is.na(data$x1), ]                 # Omit NA by column via is.na
data_is.na                                            # Same result as with complete.cases

Same result as before with even less R code – perfect!

Note: The is.na function works only if you want to omit by one column. The complete.cases solution works for any amount of columns!

 

Video Tutorial: na.omit, is.na, na.rm & Other Functions

For further comparisons of the different R functions to omit NA values, have a look at the following video tutorial of my YouTube channel.

 

 

Further Reading

 



 

4 Comments. Leave new

  • what if the rows contain anything other than NA. What if it is “Not Available” . How do we deal with that type of data

    Reply
    • Hi Koorse,

      in such a case you have two possibilities.

      1) Clean your data before applying na.omit:

      x[x == "Not Available"] <- NA

      2) Omit cases that have a certain value in x:

      x[x != "Not Available"]
      Reply
  • Referencing Example 3, how to you select more than 1 column?

    Reply
    • Hi yellowrose,

      Thank you for your comment! If you want to select more than one column you would have to specify that in the subsetting process. For instance, if you want to remove all rows with missing values in x1 and/or x2, you could use the following code:

      data_subset <- data[ , c("x1", "x2)]                  # Create subset with important columns
      data_by_column <- data[complete.cases(data_subset), ] # Omit NAs by columns
      data_by_column                                        # Print data_by_column to RStudio console

      I hope that helps.

      Joachim

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top