R Replace NA with 0 (10 Examples for Data Frame, Vector & Column)

A common way to treat missing values in R is to replace NA with 0.

You will find a summary of the most popular approaches in the following.

Choose one of these approaches according to your specific needs.

 

What are you interested in?

Data Frame: Replace NA with 0 Vector or Column: Replace NA with 0 Is the Replacement of NA’s with 0 Legit? Alternative Ways to Handle Missing Data

 

R Replace NA with 0 in a Data Frame

Consider the following example data frame in R.

data <- data.frame(x1 = c(3, 7, 2, 5, 5),
                   x2 = c(NA, 8, 6, NA, 5),
                   x3 = c(3, NA, 5, 1, 9))
data

 

Missing Values in Data

Table 1: Exemplifying Data Frame with Missing Values

 

I’m creating some duplicates of the data for the following examples.

data_1 <- data
data_2 <- data
data_3 <- data
data_4 <- data
data_5 <- data # Example for data frame with factor variable
data_5$x3 <- as.factor(data_5$x3)

 

Data Frame Example 1: The Most Common Way to Replace NA with 0

data_1[is.na(data_1)] <- 0

 

In this video, I’m applying our is.na() approach of Example 1 to a real data set (and a vector as shown later).

 

Data Frame Example 2: Replace NA During the Data Export

setwd("Insert your path here")
write.csv(data_2, "data_2.csv", na = "0")

 

Data Frame Example 3: dplyr Package

library("dplyr")
data_3 <- data_3 %>%
  mutate(x2 = coalesce(x2, 0),
         x3 = coalesce(x3, 0))

 

Data Frame Example 4: imputeTS Package

library("imputeTS")
data_4 <- na.replace(data_4, 0)

 

Data Frame Example 5: Database with Factor Variables

One common issue for replacing NA with 0 in an R database is the class of the variables in your data.

The previous examples work fine, as long as we are dealing with numeric or character variables.

However, if you have factor variables with missing values in your dataset, you have to do an additional step.

i <- sapply(data_5, is.factor) # Identify all factor variables in your data
data_5[i] <- lapply(data_5[i], as.character) # Convert factors to character variables
data_5[is.na(data_5)] <- 0 # Replace NA with 0, as shown in Example 1
data_5[i] <- lapply(data_5[i], as.factor) # Convert character columns back to factors

 

Insert Zeros for NA Values in an R Vector (or Column)

As you have seen in the previous examples, R replaces NA with 0 in multiple columns with only one line of code. However, we need to replace only a vector or a single column of our database. Let’s find out how this works.

First, create some example vector with missing values.

vec <- c(1, 9, NA, 5, 3, NA, 8, 9)
vec
 
# Duplicate vector for later examples
vec_1 <- vec
vec_2 <- vec
vec_3 <- vec
vec_4 <- vec
vec_5 <- as.factor(vec) # Example for factor vector

 

Vector Example 1: The Most Common Way to Replace NA in a Vector

vec_1[is.na(vec_1)] <- 0

 

Vector Example 2: Create Your Own Function to Replace NA’s

fun_zero <- function(vector_with_nas) {
  vector_with_nas[is.na(vector_with_nas)] <- 0
  return(vector_with_nas)
}
vec_2 <- fun_zero(vec_2)

 

Vector Example 3: Using the replace() Function

vec_3 <- replace(vec_3, is.na(vec_3), 0)

 

Vector Example 4: Using the ifelse() Function

vec_4 <- ifelse(is.na(vec_4), 0, vec_4)

 

Vector Example 5: Exchange NA’s with Zero in Factor Vectors

vec_5 <- as.numeric(as.character(vec_5)) # Note: Transform vec_5 as.character first,
                                         # otherwise you might lose the levels of your vector
vec_5[is.na(vec_5)] <- 0 # Similar to Example 1
vec_5 <- as.factor(vec_5)

 

As you can see, there are many different ways in R to replace NA with 0 – All of them with their own pros and cons.

If you want to investigate even more possibilities for a zero replacement, I can recommend the following thread on stackoverflow.

 

Is the Replacement of NA’s with 0 Legitimate?

Beside the question how to find and replace NA with 0 in R, the question arises whether such a replacement screws our statistical data analyses.

As most of the time in statistics, the answer is: It depends! If it is meaningful to substitute NA with 0, then go ahead.

For instance, let’s say we have the item “How much did you spend for holidays last year?” and people without any spending for holidays are represented by NA. Then it would be logical to change NA to 0, since these people basically spend zero money for holidays.

However, if we have NA values due to item nonresponse, we should never replace these missing values by a fixed number, i.e. 0.

 

Consider the following example:

set.seed(765) # Set seed to make the example reproducible
 
example_vector <- rnorm(10000) # Example vector: Normal distribution with 10000 observations
example_vector[1:1000] <- NA # Insert missing values for the first 1000 observations
 
plot(density(example_vector, na.rm = TRUE), 
     ylim = c(0, 0.7), 
     xlab = "Example Vector", 
     main = "With & without replacement of NA with 0")
 
example_vector[is.na(example_vector)] <- 0 # As in Example 1 in R: Replace NA with 0
 
lines(density(example_vector, na.rm = TRUE), col = "red") # Plot density of the example vector
                                                          # after replacing NA's with 0

 

R Replace NA with 0 - Densities

Graphic 1: R Replace NA with 0 – Densities with & without Zero-Replacement

 

As you can see in the example, the density of a normal distribution would be highly screwed toward zero, if we just substitute all missing values with zero (as indicated by the red density).

 

Alternatives to the Replacement of Missing Data by 0

The statistical analysis with missing data is a whole domain of statistical research.

The imputation of missing values is one of the most popular approaches nowadays.

When data is imputed, new values are estimated on the basis of imputation models in order to replace missing values by these estimates.

In fact, the replacement of NA’s with zero could also be considered as a very basic data imputation (zero imputation).

Another popular approach is casewise deletion (also called listwise deletion).

In casewise or listwise deletion, all observations with missing values are deleted – an easy task in R.

This approach has its own disadvantages, but it is easy to conduct and the default method in many programming languages such as R.

 

Conclusion

To change NA to 0 in R can be a good approach in order to get rid of missing values in your data.

The statistical software R (or RStudio) provides many ways for the replacement of NA’s.

However, such a replacement should only be conducted, if there is a logical reasoning for converting NA’s to zero.

 

Now it’s Your Turn!

I put together 10 different ways how to replace NA’s with 0 in R.

Now I’m interested to hear from you.

Are you handling NA’s with the popular approaches of Data Frame Example 1 and Vector Example 1? Or are you using other ways? Do you still have any issues with your NA’s?

Let me know in the comments!

 

References

Moritz, S. (2017). Package imputeTS

Wickham, H., Francois, R., Henry, L., Müller, K., and RStudio (2017). Package dplyr

 

Appendix

How to create the header graph:

The header graphic of this page shows a correlation plot of two continuous (i.e. numeric) variables, created with the package ggplot2.

The dark blue dots indicate observed values. The light blue dots indicate NA’s that were replaced by zero.

library("ggplot2") # Load R package ggplot2
 
set.seed(9876543) # Set seed to ensure reproducibility
 
x1 <- rnorm(2000) # Random normally distributed x1
x2 <- 2 * x1 + rnorm(2000) # Generate x2 correlated with x1 
x2[1801:2000] <- 0 # Set some values of x2 to zero
 
data_ggp <- data.frame(x1, x2) # Store x1 and x2 in a data frame
 
colours <- c(rep(1, 1800), rep(2, 200)) # Set colours 
 
ggp <- ggplot(data_ggp, aes(x = x1, y = x2)) + # Create ggplot
  geom_point(aes(col = colours , size = 1.1)) + 
  theme(legend.position = "none")
ggp

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


12 Comments. Leave new

  • Ahmed Schepis
    June 12, 2018 5:25 am

    I simply desired to say thanks once more. I’m not certain the things that I could possibly have used without the entire aspects revealed by you over such subject matter. It seemed to be the alarming dilemma in my opinion, but discovering the very professional fashion you solved it took me to cry for fulfillment. Extremely grateful for this service as well as pray you are aware of a great job that you’re undertaking educating the others through your webblog. Most likely you have never come across all of us.

    Reply
  • Thank you so so much!!! It helped

    Reply
  • This is brilliant! Thank you for taking the time to put together such a well versed set of examples.

    Reply
  • hello, could you provide any assistance with merging data frames. I had a look at your page about it but this particular scenario doesn’t come up.
    I have 2 dataframes:
    1. x y
    a A1 blue
    b A2 N/A
    c A3 yellow
    2. x y
    a A1 N/A
    b A2 red
    c A3 N/A

    each dataframe fills in the holes in the other (the N/As). I’ve tried rbind, cbind, etc… but they only seem to try and add extra columns or rows. I just want to fill in those blanks by merging the dataframes. I hope there is an easy way to do this… there is in excel, but that’s too slow.

    Any guidance appreciated, but no problem if not!

    Cheers

    Reply
    • Hey Freya,

      Is the following R code what you are looking for?

      data1 <- data.frame(x = c("A1", "A2", "A3"),
                          y = c("blue", NA, "yellow"))
       
      data2 <- data.frame(x = c("A1", "A2", "A3"),
                          y = c(NA, "red", NA))
       
      data_combi <- data.frame(x = data1$x,
                               y = ifelse(is.na(data1$y), data2$y, data1$y))
      data_combi
      #    x      y
      # 1 A1   blue
      # 2 A2    red
      # 3 A3 yellow

      Please note that this only works in case both data frames contain exactly the same values in x, and in case both data frames are ordered the same way.

      I hope that helps!

      Joachim

      Reply
  • Dear Joachim,

    Thanks for the explanations. However, in my case, I would like to replace randomly 1000 NA values in a column with 0s. None of your examples provide an exact solution for this task. Could you please suggest one? I think it would be helpful for me and for other people as well.

    Thanks

    Reply
    • Hey Filipe,

      Thank you for the kind comment.

      Below, I have created an example that explains how to do that:

      data <- data.frame(NA_col = rep(NA, 2000))  # Create example data
       
      set.seed(3251678)                            # Create random dummy indicator for 0 assignment
      my_dummy <- rep(1, nrow(data))
      my_dummy[1:1000] <- 0
      my_dummy <- sample(my_dummy)
       
      data$NA_col[my_dummy == 0] <- 0             # Replace NA by 0
       
      head(data)                                  # Print head of final data
      #   NA_col
      # 1      0
      # 2      0
      # 3      0
      # 4     NA
      # 5     NA
      # 6      0

      I hope that helps!

      Regards,
      Joachim

      Reply
  • Marcos Lugo
    May 12, 2022 8:51 pm

    Thank you, excellent explanation

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top