Calculate Correlation Matrix Only for Numeric Columns in R (2 Examples)

 

In this tutorial, I’ll explain how to apply the cor function only to numeric variables in the R programming language.

The content is structured as follows:

If you want to know more about these content blocks, keep reading!

 

Creation of Exemplifying Data

As a first step, I’ll need to create some example data:

set.seed(972634)                                 # Create example data
data <- data.frame(x1 = rnorm(100),
                   x2 = runif(100),
                   x3 = letters[1:10],
                   x4 = rpois(100, 3))
head(data)                                       # Print example data
#           x1         x2 x3 x4
# 1 -0.1690348 0.99026508  a  2
# 2 -0.5750698 0.82617886  b  1
# 3  2.4263886 0.58398940  c  7
# 4  0.3364569 0.07942794  d  4
# 5  0.4823497 0.22178303  e  4
# 6  0.3244318 0.39867537  f  2

Have a look at the previous output of the RStudio console. It shows that our example data has four columns. The variables x1, x2, and x4 are numeric, but the variable x3 is a factor.

 

Example 1: Error in cor(data) : ‘x’ must be numeric

This example shows the problem with non-numeric variables when computing a correlation matrix in R.

Consider the following R code:

cor(data)                                        # Trying to apply cor function
# Error in cor(data) : 'x' must be numeric

As you can see, the error message “‘x’ must be numeric” was returned. The reason for this is that the cor function cannot handle non-numeric columns in the input data.

Let’s fix this!

 

Example 2: Applying cor() Function Only to Numeric Variables

In this example, I’ll show how to apply the cor function only to numeric columns of a data frame.

For this, we can select a subset of all numeric variables in our data using the unlist, lapply, and is.numeric functions as shown below:

cor(data[, unlist(lapply(data, is.numeric))])    # Properly apply cor
#              x1           x2        x4
# x1  1.000000000 -0.003369401 0.1028944
# x2 -0.003369401  1.000000000 0.1382353
# x4  0.102894402  0.138235274 1.0000000

As you can see, the previously shown correlation matrix contains only the numeric columns of our example data.

 

Video & Further Resources

I have recently published a video on my YouTube channel, which illustrates the R programming syntax of this tutorial. You can find the video below.

 

 

Besides the video, you may read the other posts on this website. I have released numerous articles about related topics such as extracting data, importing data, and numeric values:

 

At this point you should have learned how to correlation matrices in the R programming language. If you have any further questions, let me know in the comments.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top