Find Character Pattern in Data Frame Column in R (2 Examples)
In this R post you’ll learn how to identify all cells that match a certain character pattern in a data frame column.
The content of the article is structured as follows:
Let’s dive right in.
Creating Exemplifying Data
We’ll use the following data as a basement for this R programming tutorial:
data <- data.frame(x1 = 1:5, # Create example data frame x2 = c("foo", "foo", "bar", "foo", "bar")) data # Print example data frame
Table 1 illustrates the output of the RStudio console and shows that our example data contains five observations and two columns. The variable x1 is an integer and the column x2 has the character class.
Example 1: Identify Character Pattern in Data Frame Column Using grepl() Function
Example 1 demonstrates how to search and find a character pattern in a data frame column using the grepl function provided by the basic installation of the R programming language.
We can use grepl to return a logical vector that identifies all rows that contain a match with a certain character pattern in a certain column:
grepl("bar", data$x2) # Return logical vector using grepl # [1] FALSE FALSE TRUE FALSE TRUE
As you can see, the third and fifth rows contain a match with the character string “bar” in the column x2.
We may now use this logical vector to subset our data frame conditionally:
data_new1 <- data[grepl("bar", data$x2), ] # Subset data frame data_new1 # Print data frame subset
As shown in Table 2, we have created a new data frame where all rows have been deleted that do not match the character string “bar” in the variable x2.
Example 2: Identify Character Pattern in Data Frame Column Using str_detect() Function of stringr Package
In Example 1, we have used Base R to find a character pattern in a data frame column. Example 2, in contrast, shows how to use the stringr package for this task.
First, we have to install and load the stringr package:
install.packages("stringr") # Install stringr package library("stringr") # Load stringr
Now, we can apply the str_detect function to return a logical vector that identifies all matching column cells:
str_detect("bar", data$x2) # Return logical vector using str_detect # [1] FALSE FALSE TRUE FALSE TRUE
Similar to Example 1, we can use this logical vector to extract specific rows from our data:
data_new2 <- data[str_detect("bar", data$x2), ] # Subset data frame data_new2 # Print data frame subset
Table 3 shows the output of the previous R code: The same data frame as in Example 1. This time, we have used the str_detect function of the stringr package instead of the Base R grepl function to test for character matches.
Video, Further Resources & Summary
I have recently published a video on my YouTube channel, which illustrates the R codes of this article. You can find the video below:
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
Furthermore, you may have a look at some other tutorials on my website. I have released several tutorials already:
- Convert Factor to Character Class in R
- Convert Data Frame Column to Numeric
- Find Missing Values (6 Examples for Data Frame, Column & Vector)
- Select Data Frame Column Using Character Vector
- R Programming Tutorials
In this tutorial, you have learned how to check and find all rows that match a certain character pattern in a data frame column in the R programming language. If you have any further questions, kindly let me know in the comments.
Statistics Globe Newsletter