NA Values are not Excluded when Using cor Function in R (Example)

 

In this tutorial you’ll learn how to exclude NA values when using the cor function in the R programming language.

The table of content is structured as follows:

Let’s get straight to the example.

 

Example Data

First, we will need to create an example data matrix with 10 random NAs:

set.seed(999)
sample_data <- matrix(rnorm(30), 
                      nrow=10, 
                      ncol=3)
na_values <- sample(1:length(sample_data),
                    10)
sample_data[na_values] <- NA
colnames(sample_data) <- paste("col", 
                               1:ncol(sample_data),sep="")
rownames(sample_data) <- paste("row", 
                              1:nrow(sample_data), sep="")
sample_data
 
#             col1       col2        col3
# row1          NA -0.7602105 -0.02891409
# row2   0.3729390 -1.2028067          NA
# row3          NA  0.7081885  0.46330763
# row4          NA         NA  0.63862129
# row5   0.5190691         NA -0.13322764
# row6   1.0478328 -0.8561538  1.04789141
# row7          NA  0.1950530          NA
# row8  -1.4076432  0.4192383  1.64057417
# row9          NA  0.2887847  0.14849188
# row10         NA  1.4041693 -1.03728957

As you can see, our matrix contains three numeric columns and ten rows with 10 NA values.

Let’s try to make a correlation matrix using the cor() function:

cor(sample_data)
#      col1 col2 col3
# col1    1   NA   NA
# col2   NA    1   NA
# col3   NA   NA    1

The cor() function doesn’t exclude NA values, so we are not getting meaningful results. Let’s see what we can do about it.

 

Example: Excluding NA Values in cor Function Using “use=” Argument

To get meaningful values in a correlation matrix with NA values, we can apply the “use=” argument inside the cor() function. This is an optional character string that gives us a method for computing covariances in the presence of NA values:

cor(sample_data, 
    use="pairwise.complete.obs")
 
#            col1       col2       col3
# col1  1.0000000 -0.8899394 -0.6067244
# col2 -0.8899394  1.0000000 -0.4271438
# col3 -0.6067244 -0.4271438  1.0000000

In this specific example, we have used the option “pairwise.complete.obs”. Have a look at the help documentation of the cor function to get further information on other methods.

Please note that the specification use = “pairwise.complete.obs” can lead to bias, and hence, misleading results. You can check how listwise deletion for missing data works before using this specification.

Alternatively, you might impute your missing values to create a data frame that contains only non-NA values before calculating the correlations. You can find more on missing data imputation techniques here.

 

Video, Further Resources & Summary

Do you need more explanations on what to do when NA values are not excluded from the cor() function in R? Then you should have a look at the following YouTube video of the Statistics Globe YouTube channel.

 

The YouTube video will be added soon.

 

Furthermore, you could have a look at some of the other tutorials on Statistics Globe:

This post has shown how to exclude NA values when using the cor() function in R. In case you have further questions, you may leave a comment below.

 

Paula Villasante Soriano Statistician & R Programmer

This page was created in collaboration with Paula Villasante Soriano. Please have a look at Paula’s author page to get more information about her academic background and the other articles she has written for Statistics Globe.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top