Select Rows with Partial String Match in R (2 Examples)

 

In this article you’ll learn how to filter rows where a specific column has a partial string match in the R programming language.

Table of contents:

Let’s do this:

 

Creation of Exemplifying Data

First, we’ll have to load some data that we can use in the examples later on. In this tutorial, we are using the iris data set:

data(iris)                                         # Example data
head(iris)                                         # Head of example data
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

The previous output of the RStudio console shows that our example data has five columns, whereby the variable species contains character strings. In the examples of this tutorial, we assume that we want to select rows where the variable species partially matches the character string “virg”. Let’s do this…

 

Example 1: Detect Rows with Partial Match Using stringr Package

This Example explains how to extract rows with a partial match using the stringr package. We first need to install and load the stringr package:

install.packages("stringr")                        # Install stringr package
library("stringr")                                 # Load stringr

Now we can subset our data with the str_detect function as shown below:

data1 <- iris[str_detect(iris$Species, "virg"), ]  # Extract matching rows with str_detect
head(data1)
#     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
# 101          6.3         3.3          6.0         2.5 virginica
# 102          5.8         2.7          5.1         1.9 virginica
# 103          7.1         3.0          5.9         2.1 virginica
# 104          6.3         2.9          5.6         1.8 virginica
# 105          6.5         3.0          5.8         2.2 virginica
# 106          7.6         3.0          6.6         2.1 virginica

As you can see, we have extracted only rows where the Species column partially matches the character string “virg”.

 

Example 2: Detect Rows with Partial Match Using data.table Package

In Example 2, I’ll show how to detect rows with a partial match using the data.table package. Again, we need to install and load the package first:

install.packages("data.table")                     # Install data.table package
library("data.table")                              # Load data.table

Now, we can use the %like%-operator as follows:

data2 <- iris[iris$Species %like% "virg", ]        # Extract matching rows with %like%
head(data2)
#     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
# 101          6.3         3.3          6.0         2.5 virginica
# 102          5.8         2.7          5.1         1.9 virginica
# 103          7.1         3.0          5.9         2.1 virginica
# 104          6.3         2.9          5.6         1.8 virginica
# 105          6.5         3.0          5.8         2.2 virginica
# 106          7.6         3.0          6.6         2.1 virginica

Exactly the same result as in Example 1 – but this time with a completely different R code.

 

Video & Further Resources

Would you like to know more about the subsetting of data frames? Then I can recommend to have a look at the following video which I have published on my YouTube channel. In the video, I explain the R codes of this article:

 

 

Furthermore, you might have a look at the related tutorials of my homepage:

 

To summarize: This tutorial showed how to extract data frame rows based on a partial match of a character string in R. Let me know in the comments, if you have any additional questions and/or comments.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


2 Comments. Leave new

  • This is very helpful! Is it possible to select rows if they DON’T have a partial match to a string? For example, I have a list of schools, and I’d like to exclude all schools that have “Elementary” in the name.

    Reply
    • Hey Danielle,

      Thank you for the kind comment, glad you like the tutorial!

      Please excuse the late response. I was on a long holiday so unfortunately I wasn’t able to reply sooner. Still need help with your code?

      Regards,
      Joachim

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top