filter R Function of dplyr Package (Example)

 

This article illustrates how to subset rows by logical conditions with the filter function of the dplyr package in R programming.

Table of contents:

So now the part you have been waiting for – the examples!

 

Creation of Example Data

First, we need to install and load dplyr to RStudio:

install.packages("dplyr")                        # Install and load dplyr
library("dplyr")

Then, we have to create some example data:

data <- data.frame(x1 = 1:5,                     # Create example data
                   x2 = letters[1:5],
                   group = c("gr1", "gr2", "gr1", "gr3", "gr2"))
data                                             # Print data to RStudio console
#   x1 x2 group
# 1  1  a   gr1
# 2  2  b   gr2
# 3  3  c   gr1
# 4  4  d   gr3
# 5  5  e   gr2

Our example data is a data frame with five rows and three columns. The third column contains a grouping variable with three groups.

Note that we could also apply the following code to a tibble.

 

Example: Extract Rows by Logical Condition with filter Function

If we want to subset certain rows of our data based on a logical condition, we can apply the filter function of the dplyr package as follows:

filter(data, group == "gr2")                     # Subset data with filter function
#   x1 x2 group
# 1  2  b   gr2
# 2  5  e   gr2

As you can see, we extracted only rows where the grouping variable is equal to gr2.

 

Video & Further Resources

Would you like to learn more about the handling of data frames and tidyverse tibbles in R? Then you might watch the following video instruction of my YouTube channel. I’m explaining the R syntax of this tutorial in the video:

 

 

Furthermore, you might read some of the other tutorials of this homepage:

 

In summary: This article showed how to retain only specific rows of a data frame with the filter function of the dplyr package in the R programming language. Please let me know in the comments, if you have any additional questions. Furthermore, don’t forget to subscribe to my email newsletter for updates on new articles.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


2 Comments. Leave new

  • harish Sudarsanam
    March 5, 2023 4:18 pm

    I have a large data set of 39K rows, and 6 columns, I want to select rows in which the difference between the first and any other columns is more than 2 or less -2. It is a gene expression dataset and is normalized log2 values.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top