Select Data Frame Columns by Logical Condition in R (2 Examples)

 

This page illustrates how to extract particular data frame columns based on a logical condition in the R programming language.

The content is structured as follows:

Let’s do this:

 

Creating Exemplifying Data

First, we’ll need to create some data that we can use in the following examples:

data <- data.frame(x1 = 1:5,                      # Create example data
                   y1 = letters[1:5],
                   x2 = "x",
                   x3 = 9:5,
                   y2 = 7)
data                                              # Print example data
#   x1 y1 x2 x3 y2
# 1  1  a  x  9  7
# 2  2  b  x  8  7
# 3  3  c  x  7  7
# 4  4  d  x  6  7
# 5  5  e  x  5  7

The previous output of the RStudio console shows the structure of our example data: It has five rows and five columns. Some of the variable names start with x and some of the variable names start with y.

 

Example 1: Extract Data Frame Variables by Logical Condition Using grepl() Function

In Example 1, I’ll explain how to select certain columns based on a logical condition using the grepl function. Have a look at the following R code:

data_new1 <- data[ , grepl("x", colnames(data))]  # Extract by logical
data_new1                                         # Print updated data
#   x1 x2 x3
# 1  1  x  9
# 2  2  x  8
# 3  3  x  7
# 4  4  x  6
# 5  5  x  5

As you can see, the previous R syntax created a new data frame called data_new1 that consists only of columns with an x in their name.

 

Example 2: Extract Data Frame Variables by Logical Condition Using select() & starts_with() Functions of dplyr Package

In Example 2, I’ll illustrate how to subset data frame columns whose names match a specific prefix condition.

For this, we’ll use the dplyr add-on package. First, we have to install and load the dplyr package:

install.packages("dplyr")                         # Install dplyr package
library("dplyr")                                  # Load dplyr

Now, we can use the select and starts_with functions to extract only columns starting with an x:

data_new2 <- data %>%                             # Using dplyr functions
  select(starts_with("x"))
data_new2                                         # Print updated data
#   x1 x2 x3
# 1  1  x  9
# 2  2  x  8
# 3  3  x  7
# 4  4  x  6
# 5  5  x  5

As you can see, the retained columns are exactly the same as in Example 1.

 

Video & Further Resources

I have recently released a video on my YouTube channel, which illustrates the R programming code of this tutorial. You can find the video below:

 

 

In addition, you may want to have a look at some of the related tutorials on my homepage. You can find a selection of tutorials below.

 

Summary: In this tutorial, I explained how to keep only variables where a logical condition is TRUE in the R programming language. Let me know in the comments section, if you have additional questions. Besides that, don’t forget to subscribe to my email newsletter in order to get updates on new articles.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top