Subset Data Frame Rows by Logical Condition in R (5 Examples)
In this tutorial you’ll learn how to subset rows of a data frame based on a logical condition in the R programming language.
Table of contents:
- Creation of Example Data
- Example 1: Subset Rows with ==
- Example 2: Subset Rows with !=
- Example 3: Subset Rows with %in%
- Example 4: Subset Rows with subset Function
- Example 5: Subset Rows with filter Function [dplyr Package]
- Video & Further Resources
Here’s the step-by-step process.
Creation of Example Data
In the examples of this R tutorial, I’ll use the following data frame:
data <- data.frame(x1 = c(3, 7, 1, 8, 5), # Create example data x2 = letters[1:5], group = c("g1", "g2", "g1", "g3", "g1")) data # Print example data # x1 x2 group # 3 a g1 # 7 b g2 # 1 c g1 # 8 d g3 # 5 e g1
Our example data contains five rows and three columns. The column “group” will be used to filter our data.
Example 1: Subset Rows with ==
In Example 1, we’ll filter the rows of our data with the == operator. Have a look at the following R code:
data[data$group == "g1", ] # Subset rows with == # x1 x2 group # 3 a g1 # 1 c g1 # 5 e g1
We selected only rows where the group column is equal to “g1”. We did this by specifying data$group == “g1” before a comma within squared parentheses.
Example 2: Subset Rows with !=
We can also subset our data the other way around (compared to Example 1). The following R code selects only rows where the group column is unequal to “g1”. We can do this based on the != operator:
data[data$group != "g1", ] # Subset rows with != # x1 x2 group # 7 b g2 # 8 d g3
Example 3: Subset Rows with %in%
We can also use the %in% operator to filter data by a logical vector. The %in% operator is especially helpful, when we want to use multiple conditions. In the following R syntax, we retain rows where the group column is equal to “g1” OR “g3”:
data[data$group %in% c("g1", "g3"), ] # Subset rows with %in% # x1 x2 group # 3 a g1 # 1 c g1 # 8 d g3 # 5 e g1
Example 4: Subset Rows with subset Function
Base R also provides the subset() function for the filtering of rows by a logical vector. Consider the following R code:
subset(data, group == "g1") # Apply subset function # x1 x2 group # 3 a g1 # 1 c g1 # 5 e g1
The output is the same as in Example 1, but this time we used the subset function by specifying the name of our data frame and the logical criteria within the function.
Example 5: Subset Rows with filter Function [dplyr Package]
We can also use the dplyr package to extract rows of our data. First, we need to install and load the package to R:
install.packages("dplyr") # Install dplyr package library("dplyr") # Load dplyr package
Now, we can use the filter function of the dplyr package as follows:
filter(data, group == "g1") # Apply filter function # x1 x2 group # 3 a g1 # 1 c g1 # 5 e g1
Compare the R syntax of Example 4 and 5. The subset and filter functions are very similar.
Video & Further Resources
Would you like to learn more about the subsetting of rows? Then you may have a look at the following video of my YouTube channel. In the video, I illustrate the R programming code of this post in a live session:
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
Furthermore, you might have a look at the related articles on this website.
- filter Function of dplyr Package
- Sample Random Rows of Data Frame
- Extract Certain Columns of Data Frame
- The R Programming Language
To summarize: This article explained how to return rows according to a matching criterion (e.g. conditioning on and ID or a factor variable) in the R programming language. Please let me know in the comments, if you have further questions. Furthermore, please subscribe to my email newsletter to receive regular updates on the newest tutorials.
Statistics Globe Newsletter
6 Comments. Leave new
What if I wanted all columns but only rows where X1 = 1, 3 or 7?
Hey Jeff,
Have you tried the code of Example 3 of this tutorial? You might exchange “g1” and “g3” by your values.
Regards,
Joachim
Hi Joachim! Thank for the tutorial. I’m trying to filter rows that contain “(1)” in column but all the lines of code you explained return rows that contain either “1” or “()”. How can I specify that I need it in the exact order of “(1)”?
Thanks!
Hello Britta,
It is strange, shouldn’t be the case. Can you share your code with us? I changed the sample data a bit to adapt it to your case and used the first method given in the tutorial. It worked for me as it was supposed to, see below.
Regards,
Cansu
Hi Cansu,
I tried different packages, this was my result. In my case, I only want the rows that contain (1), so 6(1) and 5(1).
data <- data.frame(x1 = c(3, 7, 1, 8, 5), # Create example data
x2 = letters[1:5],
group = c("6(1)", "(2)", "1(2)", "1(3)", "5(1)"))
d2 % filter(grepl(‘(1)’, group))
d2<-data[data$group == "(1)", ]
Both codes give the same result. I did manage to solve my problem though with the following code:
d2 <- data[grepl("\\(1\\)",data$group),]
Hello Brita,
Ah yes, it is a different story when the data is like yours. That’s why it is important to know the exact structure of the dataset and the code. I am glad that you solved it. In case of any further questions, we are here to help. Don’t hesitate to contact us.
Regards,
Cansu