Select Only Numeric Columns from Data Frame in R (Example)
In this tutorial, I’ll explain how to subset only numeric variables from a data frame in the R programming language.
The article consists of these contents:
- Creation of Exemplifying Data
- Example 1: Extract Numeric Columns from Data Frame [Base R]
- Example 2: Extract Numeric Columns from Data Frame [dplyr Package]
- Video, Further Resources & Summary
It’s time to dive into the examples.
Creation of Exemplifying Data
In the examples of this tutorial, we will use the following data frame in R:
data <- data.frame(x1 = 1:5, # Create example data frame x2 = LETTERS[1:5], x3 = 2, x4 = factor(c(1, 3, 2, 2, 1)), stringsAsFactors = FALSE) data # Print example data to console # x1 x2 x3 x4 # 1 A 2 1 # 2 B 2 3 # 3 C 2 2 # 4 D 2 2 # 5 E 2 1
Our example data contains four columns and five rows.
Let’s use the str() function to have a look at the variable classes of our columns:
str(data) # Inspect variable classes # 'data.frame': 5 obs. of 4 variables: # $ x1: int 1 2 3 4 5 # $ x2: chr "A" "B" "C" "D" ... # $ x3: num 2 2 2 2 2 # $ x4: Factor w/ 3 levels "1","2","3": 1 3 2 2 1
As you can see based on the output of the RStudio console, the columns x1 and x3 are numeric. x2 is a character string and x4 is a factor variable.
Next, I’ll show you how to extract only numeric columns from our data set. Keep on reading!
Example 1: Extract Numeric Columns from Data Frame [Base R]
In Example 1, I’ll show you how to subset numeric data with the base installation of the R programming language.
First, we need to identify all columns that are numeric. For this task, we can use a combination of the R functions unlist(), lapply(), and is.numeric():
num_cols <- unlist(lapply(data, is.numeric)) # Identify numeric columns num_cols # x1 x2 x3 x4 # TRUE FALSE TRUE FALSE
As you can see, the previous R code returned a logical vector illustrating which of our variables are numeric.
Now, we can use this logical vector to take a subset of our data frame:
data_num <- data[ , num_cols] # Subset numeric columns of data data_num # Print subset to RStudio console # x1 x3 # 1 2 # 2 2 # 3 2 # 4 2 # 5 2
The remaining subset only contains the numeric columns (i.e. x1 and x3). Looks good!
Example 2: Extract Numeric Columns from Data Frame [dplyr Package]
You might say the previous R code of Example 1 was a bit difficult to remember. Fortunately, the dplyr package provides a much simpler solution for the subsetting of numeric columns from a data frame.
First, we need to install and load the dplyr package in R:
install.packages("dplyr") # Install dplyr library("dplyr") # Load dplyr
Now, we can use the select_if function of the dplyr package as shown below:
data_num2 <- select_if(data, is.numeric) # Subset numeric columns with dplyr data_num2 # Print subset to RStudio console # x1 x3 # 1 2 # 2 2 # 3 2 # 4 2 # 5 2
As you can see, the output is exactly the same, but the R syntax was much easier to apply.
Video, Further Resources & Summary
Do you need further explanations on the R code of this article? Then you could have a look at the following video of my YouTube channel. In the video, I’m explaining the R codes of this article:
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
In addition, I can recommend to read the related posts of https://statisticsglobe.com/:
- Extract Certain Columns of Data Frame in R
- Convert Data Frame Column to Numeric in R
- The unlist Function in R
- R Functions List (+ Examples)
- The R Programming Language
To summarize: On this page you learned how to clean your data from non-numeric variables in the R programming language. In case you have additional comments and/or questions, let me know in the comments section.
Statistics Globe Newsletter
2 Comments. Leave new
Clearly explained!
Great to hear, thank you Gilbert!