NA Omit in R | 3 Example Codes for na.omit (Data Frame, Vector & by Column)
Basic R Syntax:
na.omit(data)
The na.omit R function removes all incomplete cases of a data object (typically of a data frame, matrix or vector). The syntax above illustrates the basic programming code for na.omit in R.
In the following R tutorial, I will show you 3 examples how the na.omit R function can be used. Sounds good? Let’s dive right in…
Example 1: na.omit in R Data Frame
na.omit is usually applied to a whole data set. Let’s create a simple data frame, for the following example:
data <- data.frame(x1 = c(9, 6, NA, 9, 2, 5, NA), # Column with 2 missing values x2 = c(NA, 5, 2, 1, 5, 8, 0), # Column with 1 missing values x3 = c(1, 3, 5, 7, 9, 7, 5)) # Column without missing values data # Print data to RStudio console
Table 1: Example Data Frame for the Application of NA Omit in R.
Now, let’s apply the na.omit command and see what happens:
data_omit <- na.omit(data) # Apply na.omit in R data_omit # Print data_omit to RStudio console
Table 2: Example Data Frame after the Application of NA Omit in R.
Compare Table 1 and Table 2, i.e. the example data frame before and after the application of na.omit. As you can see, all rows with NA values where removed. This method is sometimes referred to as casewise or listwise deletion.
Note: The R programming code of na.omit is the same, no matter if the data set has the data type matrix, data.frame, or data.table. The previous code can therefore also be used for a matrix or a data.table.
Example 2: R Omit NA from Vector
It is also possible to omit NAs of a vector or a single column. To illustrate that, I’m going to use the first column of our previously created data frame X1:
data$x1 # Original data vector with NAs # 9 6 NA 9 2 5 NA
The original column vector has two missing values. Let’s omit these NA values via the na.omit R function:
na.omit(data$x1) # Vector without NAs # 9 6 9 2 5 # attr(,"na.action") # 3 7 # attr(,"class") # "omit"
The first line of the output consists of all cases that are not NA. However, the output also consists of additional information such as the positions of the deleted values and the class. If you want to get rid of these attributes, you can simply use the is.numeric function:
as.numeric(na.omit(data$x1)) # Vector without NAs & attributes # 9 6 9 2 5
Looks good! Let’s move on to the next example…
Example 3: NA Omit by Column? na.omit vs. complete.cases vs. is.na
In practice, you will often only need the complete cases of some columns, but not of all columns. Unfortunately, the na.omit command is difficult to use for this task, since the function is designed to omit rows based on all columns of a data object.
However, other functions can easily be used to exclusively omit NA values of specific columns. Let’s assume that we exclusively want to NA omit by column X1 of our previously created example data frame.
A function that handles this task is the complete.cases function. First, we need to create a subset with all columns of which the NAs should be deleted…
data_subset <- data[ , c("x1")] # Create subset with important columns
…and then we can apply the complete cases function to exclude all rows of our original data based on this subset:
data_by_column <- data[complete.cases(data_subset), ] # Omit NAs by columns data_by_column # Print data_by_column to RStudio console
Table 3: Remove Rows by Columns via the complete.cases Function.
As you can see based on Table 3: All rows with a missing value in X1 are deleted; the row with a missing value in X2 is kept.
If you want to omit rows based on exactly one column, the is.na function works even quicker than complete.cases:
data_is.na <- data[!is.na(data$x1), ] # Omit NA by column via is.na data_is.na # Same result as with complete.cases
Same result as before with even less R code – perfect!
Note: The is.na function works only if you want to omit by one column. The complete.cases solution works for any amount of columns!
Video Tutorial and Further Research
In case you need further info on the examples of this tutorial, I recommend having a look at the following video on my YouTube channel. In the video instruction, I’m explaining the R code of this article.
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
For further comparisons of the different R functions to handle NA values, have a look at the following video tutorial of my YouTube channel.
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
Further Reading
- The complete.cases Function in R
- The is.na Function in R
- NA Values in R
- Remove NA Values from Vector in R
- Handling of Missing Data
- Listwise Deletion
- The R Programming Language
Statistics Globe Newsletter
20 Comments. Leave new
what if the rows contain anything other than NA. What if it is “Not Available” . How do we deal with that type of data
Hi Koorse,
in such a case you have two possibilities.
1) Clean your data before applying na.omit:
2) Omit cases that have a certain value in x:
Referencing Example 3, how to you select more than 1 column?
Hi yellowrose,
Thank you for your comment! If you want to select more than one column you would have to specify that in the subsetting process. For instance, if you want to remove all rows with missing values in x1 and/or x2, you could use the following code:
I hope that helps.
Joachim
I have replaced the missing data with the number “9” for reference, is there a way to just ignore the “9” instead?
Hey Shantae,
You could use the following R code to remove all rows with at least one 9:
Regards,
Joachim
Library of this funtction?
Hey Ismail,
na.omit is part of the stats package, which is already loaded with Base R.
Regards, Joachim
Does it works same as complete.cases?
Hey Raminder,
na.omit and complete.cases can produce the same output. However, the usage of the functions is slightly different.
For example, the following 2 codes create the same output:
The reason for this is that the complete.cases function returns a logical indicator whether a row is complete or not. You can see that by applying only the complete.cases function as shown below:
I hope that helps!
Joachim
Say I have 10 columns and I want to remove all rows that have total NA > 3 in a row. How to do that? Thanks.
Hey Raj,
You may use the following R code for this:
I hope that helps!
Joachim
Hello,
I am new to r,
I want to clean my missing data
But none of these functions na.omit (), complete.cases () work, I don’t know where the problem is
Hey Yao,
Could you illustrate how your data looks like, and how it should look like afterwards?
Regards
Joachim
Hi Joachim,
When I want to use na.omit() in a pipe before plotting a graph, my plot becomes completely gray. How do I solve this?
Hey Rosalie,
Could you check the data set that is returned by the na.omit function? Maybe all of your rows contain NA values.
Regards,
Joachim
what if the dataset contains “?” on a lot of rows. how do we clean it? or replace with NA?
Hey,
Please have a look at the following example:
Regards,
Joachim
Hey question what about if you want to delete the NA if they appear in 3 colums but not the rest?
example.. if imagine you have a dataset with columns a b and c, and you want to delete the NA ONLY when all 3 are NA so
a. b. c
1.- NA NA NA. <- to be deleted
2.- 20. NA. NA <- not to be deleted
3.- 15. NA NA <- not to be deleted
Hey Dan,
Please have a look at examples 5 and 6 of this tutorial. They explain how to do that using Base R and the dplyr package.
Regards,
Joachim