Extract Substring Before or After Pattern in R (2 Examples)


In this article you’ll learn how to return characters of string in front or after a certain pattern in the R programming language.

Creation of Example Data

Let’s first create a character string in R that we can use in the examples later on:

x <- "hello xxx other stuff"         # Example character string
x                                    # Print example string
# "hello xxx other stuff"

Our example string consists of the words “hello” and “other stuff” as well as of the pattern “xxx” in between.


Example 1: Extract Characters Before Pattern in R

Let’s assume that we want to extract all characters of our character string before the pattern “xxx”. Then, we can use the sub function as follows:

sub(" xxx.*", "", x)                 # Extract characters before pattern
# "hello"

As you can see based on the output of the RStudio console, the previous R code returned only the substring “hello”, i.e. the characters before the pattern “xxx”.

Note that we had to specify the symbols “.*” after the pattern “xxx” within the sub function in order to get this result.


Example 2: Extract Characters After Pattern in R

In this Example I’ll show you how to return the characters after a particular pattern. As in Example 1, we have to use the sub function and the symbols “.*”. However, this time we have to put these symbols in front of our pattern “xxx”:

sub(".*xxx ", "", x)                 # Extract characters after pattern
# "other stuff"

This time the sub function is extracting the words on the right side of our pattern, i.e. “other stuff”.


Summary: This article illustrated how to get substrings according to a specified position in the R programming language. If you have any further comments and/or questions, don’t hesitate to let me know in the comments below.


