Extract Substring Before or After Pattern in R (2 Examples)

 

In this article you’ll learn how to return characters of string in front or after a certain pattern in the R programming language.

The content of the page is structured as follows:

Let’s dive right in:

 

Creation of Example Data

Let’s first create a character string in R that we can use in the examples later on:

x <- "hello xxx other stuff"         # Example character string
x                                    # Print example string
# "hello xxx other stuff"

Our example string consists of the words “hello” and “other stuff” as well as of the pattern “xxx” in between.

 

Example 1: Extract Characters Before Pattern in R

Let’s assume that we want to extract all characters of our character string before the pattern “xxx”. Then, we can use the sub function as follows:

sub(" xxx.*", "", x)                 # Extract characters before pattern
# "hello"

As you can see based on the output of the RStudio console, the previous R code returned only the substring “hello”, i.e. the characters before the pattern “xxx”.

Note that we had to specify the symbols “.*” after the pattern “xxx” within the sub function in order to get this result.

 

Example 2: Extract Characters After Pattern in R

In this Example I’ll show you how to return the characters after a particular pattern. As in Example 1, we have to use the sub function and the symbols “.*”. However, this time we have to put these symbols in front of our pattern “xxx”:

sub(".*xxx ", "", x)                 # Extract characters after pattern
# "other stuff"

This time the sub function is extracting the words on the right side of our pattern, i.e. “other stuff”.

 

Video, Further Resources & Summary

If you need further explanations on the R programming codes of this post, I can recommend to watch the following video of my YouTube channel. In the video, I illustrate the R codes of this article.

 

Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.

YouTube Content Consent Button Thumbnail

YouTube privacy policy

If you accept this notice, your choice will be saved and the page will refresh.

 

Furthermore, you could have a look at some of the related tutorials on my website:

 

Summary: This article illustrated how to get substrings according to a specified position in the R programming language. If you have any further comments and/or questions, don’t hesitate to let me know in the comments below.

 

Subscribe to my free statistics newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


4 Comments. Leave new

  • Hello, i found quite interesting this information. I was wondering if there is a way to apply this in a dataframe context. I mean I have a dataframe and i need to get the values from various columns that are after a column with a specific value.

    Reply
    • Hey Carolina,

      Thank you for the kind words!

      To clarify your question: You want to check for a certain value in each column of your data frame and then you want to extract all columns after the column containing this value?

      Regards,

      Joachim

      Reply
  • Hi,

    This was super helpful. Is there any way to just extract the word directly in front of a string? So if you had “hello i am” to be able to extract just the ‘i’ in front of am?

    Reply
    • Hey Austin,

      Thanks a lot for the nice feedback!

      You can do that by using the following R code:

      sub(" .*", "", x)

      Explanation: Please compare that code with Example 1. In Example 1, we were looking for the pattern ” xxx”. At this position, you can specify any pattern you want, so in this case we are using the pattern ” “.

      I hope that helps!

      Joachim

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top