Find & Count Exact Matches in Character String Vector in R (3 Examples)

 

This tutorial shows how to identify the index position and how to count exact matches in character strings in the R programming language.

Table of contents:

Let’s do this!

 

Exemplifying Data

Consider the exemplifying data below:

vec <- c("xxx_a", "xxx_b", "xxx", "xxx_c", "xxx")  # Create example vector
vec                                                # Print example vector
# [1] "xxx_a" "xxx_b" "xxx"   "xxx_c" "xxx"

Have a look at the previous output of the RStudio console. It shows that our example data is a vector of character strings.

Note that each of the elements of our vector contains the character pattern “xxx”. However, not all vector elements are exactly matching this pattern.

 

Example 1: Find Index Positions of Exact Matches in Character Vector

The following syntax explains how to identify the index positions of all exact matches using the which function and the == operator.

Have a look at the following R code:

which(vec == "xxx")                                # Apply which() function
# [1] 3 5

The RStudio console has returned the result we are looking for: The character pattern “xxx” has an exact match at the third and fifth index position of our example vector.

 

Example 2: Count Number of of Exact Matches in Character Vector

In this section, I’ll illustrate how to apply the length and which functions to count the number of exact matches of a character pattern in a character string.

For this, we simply have to wrap the length function around the R code that we have used in Example 1:

length(which(vec == "xxx"))                        # Apply which() & length() functions
# [1] 2

The RStudio console has returned the value 2, i.e. there exist two exact matches of the character pattern “xxx” in our vector.

 

Example 3: Exact Match Pattern Using grepl, gsub & gregexpr

The R programming language provides many functions for the handling of character strings and regular expressions. Some examples for this are the grepl, gsub, and the gregexpr functions.

In this example, I’ll explain how to handle exact matches in these type of functions.

For illustration, I’m using the grepl function in the following R code. However, we could apply the same logic when using similar functions such as sub or gsub.

We can use the grepl function to return a logical vector indicating whether a vector element has an exact match with our character pattern. For this, we have to write “\\b” before and after the pattern we want to search for:

grepl("\\bxxx\\b", vec)                            # Apply grepl() function
# [1] FALSE FALSE  TRUE FALSE  TRUE

Have a look at the previous output of the RStudio console. It shows that the third and fifth positions of our vector have an exact match with the character pattern “xxx” (as we have already shown in Example 1).

We can also use the grepl function in combination with the sum function to count the number of exact occurrences of our character pattern:

sum(grepl("\\bxxx\\b", vec))                       # Apply grepl() & sum() functions
# [1] 2

Our character pattern appears twice in our example vector.

 

Video & Further Resources

Do you need more information on the R codes of this tutorial? Then I recommend having a look at the following video of my YouTube channel. In the video, I’m explaining the examples of this article:

 

 

Furthermore, you might want to have a look at some of the other articles of this website:

 

At this point you should have learned how to find and count exact matches of specific character patterns in the R programming language. In case you have additional questions, let me know in the comments.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top