Find Character Pattern in Data Frame Column in R (2 Examples)

 

In this R post you’ll learn how to identify all cells that match a certain character pattern in a data frame column.

The content of the article is structured as follows:

Let’s dive right in.

 

Creating Exemplifying Data

We’ll use the following data as a basement for this R programming tutorial:

data <- data.frame(x1 = 1:5,                       # Create example data frame
                   x2 = c("foo", "foo", "bar", "foo", "bar"))
data                                               # Print example data frame

 

table 1 data frame find character pattern data frame column r

 

Table 1 illustrates the output of the RStudio console and shows that our example data contains five observations and two columns. The variable x1 is an integer and the column x2 has the character class.

 

Example 1: Identify Character Pattern in Data Frame Column Using grepl() Function

Example 1 demonstrates how to search and find a character pattern in a data frame column using the grepl function provided by the basic installation of the R programming language.

We can use grepl to return a logical vector that identifies all rows that contain a match with a certain character pattern in a certain column:

grepl("bar", data$x2)                              # Return logical vector using grepl
# [1] FALSE FALSE  TRUE FALSE  TRUE

As you can see, the third and fifth rows contain a match with the character string “bar” in the column x2.

We may now use this logical vector to subset our data frame conditionally:

data_new1 <- data[grepl("bar", data$x2), ]         # Subset data frame
data_new1                                          # Print data frame subset

 

table 2 data frame find character pattern data frame column r

 

As shown in Table 2, we have created a new data frame where all rows have been deleted that do not match the character string “bar” in the variable x2.

 

Example 2: Identify Character Pattern in Data Frame Column Using str_detect() Function of stringr Package

In Example 1, we have used Base R to find a character pattern in a data frame column. Example 2, in contrast, shows how to use the stringr package for this task.

First, we have to install and load the stringr package:

install.packages("stringr")                        # Install stringr package
library("stringr")                                 # Load stringr

Now, we can apply the str_detect function to return a logical vector that identifies all matching column cells:

str_detect("bar", data$x2)                         # Return logical vector using str_detect
# [1] FALSE FALSE  TRUE FALSE  TRUE

Similar to Example 1, we can use this logical vector to extract specific rows from our data:

data_new2 <- data[str_detect("bar", data$x2), ]    # Subset data frame
data_new2                                          # Print data frame subset

 

table 3 data frame find character pattern data frame column r

 

Table 3 shows the output of the previous R code: The same data frame as in Example 1. This time, we have used the str_detect function of the stringr package instead of the Base R grepl function to test for character matches.

 

Video, Further Resources & Summary

I have recently published a video on my YouTube channel, which illustrates the R codes of this article. You can find the video below:

 

 

Furthermore, you may have a look at some other tutorials on my website. I have released several tutorials already:

 

In this tutorial, you have learned how to check and find all rows that match a certain character pattern in a data frame column in the R programming language. If you have any further questions, kindly let me know in the comments.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top