Select Only Numeric Columns from Data Frame in R (Example)

 

In this tutorial, I’ll explain how to subset only numeric variables from a data frame in the R programming language.

The article consists of these contents:

It’s time to dive into the examples.

 

Creation of Exemplifying Data

In the examples of this tutorial, we will use the following data frame in R:

data <- data.frame(x1 = 1:5,                         # Create example data frame
                   x2 = LETTERS[1:5],
                   x3 = 2,
                   x4 = factor(c(1, 3, 2, 2, 1)),
                   stringsAsFactors = FALSE)
data                                                 # Print example data to console
# x1 x2 x3 x4
#  1  A  2  1
#  2  B  2  3
#  3  C  2  2
#  4  D  2  2
#  5  E  2  1

Our example data contains four columns and five rows.

Let’s use the str() function to have a look at the variable classes of our columns:

str(data)                                            # Inspect variable classes
# 'data.frame':	5 obs. of  4 variables:
# $ x1: int  1 2 3 4 5
# $ x2: chr  "A" "B" "C" "D" ...
# $ x3: num  2 2 2 2 2
# $ x4: Factor w/ 3 levels "1","2","3": 1 3 2 2 1

As you can see based on the output of the RStudio console, the columns x1 and x3 are numeric. x2 is a character string and x4 is a factor variable.

Next, I’ll show you how to extract only numeric columns from our data set. Keep on reading!

 

Example 1: Extract Numeric Columns from Data Frame [Base R]

In Example 1, I’ll show you how to subset numeric data with the base installation of the R programming language.

First, we need to identify all columns that are numeric. For this task, we can use a combination of the R functions unlist(), lapply(), and is.numeric():

num_cols <- unlist(lapply(data, is.numeric))         # Identify numeric columns
num_cols
# x1    x2    x3    x4 
# TRUE FALSE  TRUE FALSE

As you can see, the previous R code returned a logical vector illustrating which of our variables are numeric.

Now, we can use this logical vector to take a subset of our data frame:

data_num <- data[ , num_cols]                        # Subset numeric columns of data
data_num                                             # Print subset to RStudio console
# x1 x3
#  1  2
#  2  2
#  3  2
#  4  2
#  5  2

The remaining subset only contains the numeric columns (i.e. x1 and x3). Looks good!

 

Example 2: Extract Numeric Columns from Data Frame [dplyr Package]

You might say the previous R code of Example 1 was a bit difficult to remember. Fortunately, the dplyr package provides a much simpler solution for the subsetting of numeric columns from a data frame.

First, we need to install and load the dplyr package in R:

install.packages("dplyr")                            # Install dplyr
library("dplyr")                                     # Load dplyr

Now, we can use the select_if function of the dplyr package as shown below:

data_num2 <- select_if(data, is.numeric)             # Subset numeric columns with dplyr
data_num2                                            # Print subset to RStudio console
# x1 x3
#  1  2
#  2  2
#  3  2
#  4  2
#  5  2

As you can see, the output is exactly the same, but the R syntax was much easier to apply.

 

Video, Further Resources & Summary

Do you need further explanations on the R code of this article? Then you could have a look at the following video of my YouTube channel. In the video, I’m explaining the R codes of this article:

 

The YouTube video will be added soon.

 

In addition, I can recommend to read the related posts of https://statisticsglobe.com/:

 

To summarize: On this page you learned how to clean your data from non-numeric variables in the R programming language. In case you have additional comments and/or questions, let me know in the comments section.

 



2 Comments. Leave new

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top