match Function in R (4 Example Codes)

 

This tutorial shows how to search for matches between two data objects in the R programming language.

The article is mainly based on the match() R function. The basic R syntax and the definition of match are as follows:

 

Basic R Syntax of match:

match(value, table)

 

Definition of match:

The match R function returns the position of the first match between two data objects.

 

In the following R tutorial, I’ll explain in four examples how to use the match function in R.

Let’s move on to the examples!

 

Example 1: Basic Application of the match Function in R

Before we can start with the examples, we need to create some example data, in which we want to search for matches. Consider the following numeric data vector:

tab <- c(2, 5, 7, 5, 1)          # Create example vector

Let’s assume that we want to search for a match of the value 5 within our example vector. Then we can use the match R function as follows:

match(5, tab)                    # Apply match function in R
# 2

The match function returns the value 2; The value 5 was found at the second position of our example vector.

Note: The match command returned only the first match, even though the value 5 matches also the fourth element of our example vector.

 

Example 2: Match Two Vectors

In Example 1, we searched only for matches of one input vale (i.e. 5). However, we can also search matches for an entire input vector. Consider the following input vector:

vec <- c(4, 5, 1, 3, 7)          # Create input vector

Now, we can identify matches for this whole vector as follows:

match(vec, tab)                  # Apply match with input vector
# NA  2  5 NA  3

As you can see, the match function returned a vector with two NA’s and three values. The match function returns NA when no match is found.

There was no match for the first element of our input vector (i.e. 4); the second element (i.e. 5) was found at position 2; the third element (i.e. 1) was found at position 5; the fourth element (i.e. 3) had no match; and the fifth element (i.e. 7) was found at position 3.

 

Example 3: Apply match Function to String

So far, we applied the match function only to numeric values. However, we can also use match for character strings. Consider the following example character string:

tab2 <- c("ab", "a", "aa", "c")  # Create character string vector

We can now apply the match function to our example string:

match("a", tab2)                 # Apply match to character string
# 2

The first match of “a” is at the second position.

Note: Even though the first element of our character string contains the letter “a”, it is not considered as a match. Only exact matches are taken into account.

 

Example 4: Similar Functions to match

The R programming language provides several functions similar to match(). Two of the most common alternatives are pmatch and charmatch.

Depending on your specific situation, you might prefer one of these functions compared to match.

One main difference between pmatch and charmatch is that pmatch uses each match only once. Let’s assume that we have an input vector with two sevens:

pmatch(c(7, 7), tab)             # Apply pmatch function in R
# 3 NA

As you can see, pmatch returns a match at the third position for the first seven, but NA for the second seven (i.e. no match). This is different for charmatch. The charmatch function uses every match infinitely:

charmatch(c(7, 7), tab)          # Apply charmatch function in R
# 3 3

This difference is also documented in the R help documentation of pmatch:

 

pmatch-and-charmatch-alternatives-to-match-r-function

Figure 1: Excerpt of the R Help Documentation of pmatch.

 

However, this was only one difference between the two functions. If you want to know more details, you might have a look at the entire R help documentation of the two functions (simply type ?pmatch or ?charmatch to your RStudio console).

 

Further Resources

Would you like to see some more examples for the match function in R? No problem! In this case, you could have a look at the following YouTube video of Sarveshwar Inani. In the video, he shows some live examples for the match function in R:

 

Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.

YouTube Content Consent Button Thumbnail

YouTube privacy policy

If you accept this notice, your choice will be saved and the page will refresh.

 

You might also be interested in some of the other articles of my website:

This tutorial illustrated how to use match, pmatch, and charmatch in R. However, if you have comments or questions, let me know in the comments section below.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


2 Comments. Leave new

  • Hello,

    I watched the above and a previous vido where you did some extractions for homicides by shooting in Baltimore. I’ve experimented with trying to come up with a solution using a table of regular expression wildcards, to include ‘or(|)’, ‘and(%)’ & ‘not(!)’ so that when there’s a match, the second column has a standardized term to be used. Ideally when I run this, a new column will be joined to my source data frame with the correct term.

    The below is what I posted to Stack overflow in hopes of finding a solution.
    TIA
    Andy
    ++++++++++++++++++++++++++++++++++++++++++++

    I need to do a pattern match based on a 2 column list. This works fine when I’m using exact match, for matching various locations near a distribution hub.

    lfp <- "D:/Libraries/Documents/Survey/Location Cleanse List.csv"
    ltbl <- readtext(lfp) # 2 vectors (Location_Pattern,Cleansed_Hub)

    cfp <- 'url to source database'
    # Customer_Location is a concatenation of 'Customer_City' and 'Customer_State' in a prior procedure
    ctbl <- 'database extract # 3 columns (doc_id, Customer_ID,Customer_Location)
    # This adds a new column with location of hub to ctbl which will later be joined to another matrix for additional analysis based on regional hubs.
    ctbl$Customer_Hub <- ltbl$Cleansed_Hub[match(ctbl$Customer_Location, lt$Location_Pattern)]

    However, I'm finding a lot of "dirty" locations, so rather than exact match, I need to add regular expression wildcards to the Customer_Location to compensate for typos and abbreviations entered by customers when they entered "Customer_City" but the "Customer_State" will always be good due to dropdown selection by customers.

    I've done additional processing to cbtbl to change all Customer_Location to lowercae and even re-classed the vector as Expression, but nothing seems to work…

    ltbl$Location_Pattern <- tolower(ltbl$Location_Pattern)
    ctbl$Customer_Location <- tolower(ctbl$Customer_Location)
    class(ltbl$Location_Pattern) <- "Expression"

    This is a sample of ltbl:

    Location_Pattern, Cleansed_Hub
    "^dal*tx|dfw*tx", "DFW, TX"
    "^f*w*tx", "DFW, TX"
    "^hurst*tx|^eul*s*tx|^be*ord*tx|^h?e?b*tx", "DFW, TX"
    "^kil*en*tx", "Temple, TX"
    "^nol*lle*tx", "Temple, TX"
    "^west*lia*tx|^W*phalia*TX", "Temple, TX"

    this would match locations like

    "Dallas, TX" or "Dalas, TX" or "DFW, TX"

    "Fort Worth, TX", "Ft. Worth, TX", "Ft Worth, TX"

    "Hurst, TX" or "Eules, TX" or Euless, TX" or Bedford, TX or "H-E-B, TX" or "HEB, TX"

    "Kilen, TX" or "Kileen, TX" or "Killen, TX" or "Killeen, TX"

    "Nolanville, TX" or "Nolanvile, TX" or "Nollanvile, TX" or Nolan ville, TX"

    "Westfalia, TX" or "Westphalia, TX" or "W-phalia, TX"

    Reply
    • Hey Andy,

      I’m sorry for the late response, I just came back from holidays and did not have the chance to read your message earlier.

      Are you still looking for a solution to this problem?

      Regards

      Joachim

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top