Find & Count Exact Matches in Character String Vector in R (3 Examples)
This tutorial shows how to identify the index position and how to count exact matches in character strings in the R programming language.
Table of contents:
Let’s do this!
Consider the exemplifying data below:
vec <- c("xxx_a", "xxx_b", "xxx", "xxx_c", "xxx") # Create example vector vec # Print example vector #  "xxx_a" "xxx_b" "xxx" "xxx_c" "xxx"
Have a look at the previous output of the RStudio console. It shows that our example data is a vector of character strings.
Note that each of the elements of our vector contains the character pattern “xxx”. However, not all vector elements are exactly matching this pattern.
Example 1: Find Index Positions of Exact Matches in Character Vector
The following syntax explains how to identify the index positions of all exact matches using the which function and the == operator.
Have a look at the following R code:
which(vec == "xxx") # Apply which() function #  3 5
The RStudio console has returned the result we are looking for: The character pattern “xxx” has an exact match at the third and fifth index position of our example vector.
Example 2: Count Number of of Exact Matches in Character Vector
In this section, I’ll illustrate how to apply the length and which functions to count the number of exact matches of a character pattern in a character string.
For this, we simply have to wrap the length function around the R code that we have used in Example 1:
length(which(vec == "xxx")) # Apply which() & length() functions #  2
The RStudio console has returned the value 2, i.e. there exist two exact matches of the character pattern “xxx” in our vector.
Example 3: Exact Match Pattern Using grepl, gsub & gregexpr
In this example, I’ll explain how to handle exact matches in these type of functions.
For illustration, I’m using the grepl function in the following R code. However, we could apply the same logic when using similar functions such as sub or gsub.
We can use the grepl function to return a logical vector indicating whether a vector element has an exact match with our character pattern. For this, we have to write “\\b” before and after the pattern we want to search for:
grepl("\\bxxx\\b", vec) # Apply grepl() function #  FALSE FALSE TRUE FALSE TRUE
Have a look at the previous output of the RStudio console. It shows that the third and fifth positions of our vector have an exact match with the character pattern “xxx” (as we have already shown in Example 1).
We can also use the grepl function in combination with the sum function to count the number of exact occurrences of our character pattern:
sum(grepl("\\bxxx\\b", vec)) # Apply grepl() & sum() functions #  2
Our character pattern appears twice in our example vector.
Video & Further Resources
Do you need more information on the R codes of this tutorial? Then I recommend having a look at the following video of my YouTube channel. In the video, I’m explaining the examples of this article:
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
Furthermore, you might want to have a look at some of the other articles of this website:
- Concatenate Vector of Character Strings in R
- Count Number of Occurrences of Certain Character in String
- eval Function in R
- Count Number of Words in Character String
- R Programming Overview
At this point you should have learned how to find and count exact matches of specific character patterns in the R programming language. In case you have additional questions, let me know in the comments.
Statistics Globe Newsletter