strsplit Function in R (3 Examples) | How to Split a Character String

 

In this tutorial you’ll learn how to split character strings using the strsplit() function in the R programming language.

Table of contents:

Let’s just jump right in…

Definition & Basic R Syntax of strsplit Function

 

Definition: The strsplit R function splits the elements of a character string.

 

Basic R Syntax: Please find the basic R programming syntax of the strsplit function below.

strsplit(any_string, split_pattern)          # Basic R syntax of strsplit function

In the following, I’ll show three examples for the application of the strsplit function in R programming.

 

Creation of Exemplifying Data

As a first step, we have to construct some data that we can use in the following examples:

my_string <- "aaa bbb ccc dxxexxfxxg"        # Create example character string
my_string                                    # Print character string
# "aaa bbb ccc dxxexxfxxg"

Have a look at the previous output of the RStudio console. It shows that our example data is a character string containing a sequence of letters and blanks.

 

Example 1: Splitting Character String with strsplit() Function in R

The following syntax explains how to separate our character string at each blank position. For this, we have to specify the split argument to be equal to ” “.

strsplit(my_string, split = " ")             # Apply strsplit function
# [1]]
# [1] "aaa"        "bbb"        "ccc"        "dxxexxfxxg"

The previous RStudio console output shows the result of the strsplit function: A list object that contains one list element. This list element contains a vector with four vector elements. Each of these vector elements contains a character pattern extracted from our example vector.

 

Example 2: Using Character Pattern to Split a Character String

This Section shows how to use different character patterns to split a character string in R by specifying the split argument of the strsplit function. In this example, I’m using the character pattern “xx” to split our character string:

strsplit(my_string, split = "xx")            # Specify splitting pattern
# [[1]]
# [1] "aaa bbb ccc d" "e"             "f"             "g"

Again, a list was returned. However, this time our character string was split at different points.

 

Example 3: Converting Output of strsplit Function to Vector Object

The following code illustrates how to convert the list output provided by the strsplit function to a vector. For this, we are using the strsplit function in combination with the unlist function:

unlist(strsplit(my_string, split = "xx"))    # Convert strsplit output to vector
# "aaa bbb ccc d" "e"             "f"             "g"

As you can see, the RStudio console returned a vector object instead of a list.

 

Video, Further Resources & Summary

Have a look at the following video of my YouTube channel. I show the R programming code of this article in the video.

 

 

Furthermore, you might have a look at some of the other articles on my website.

 

To summarize: In this R tutorial you learned how to apply the strsplit() function. In case you have further questions, let me know in the comments section. Furthermore, don’t forget to subscribe to my email newsletter in order to get updates on new posts.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


2 Comments. Leave new

  • As usual, your expositions on a given topic are ‘at the head of the class’ as regards good organization and clarity. Missing here are two points.

    You can get into trouble when you do not use the “fixed=T” specification as illustrated below. Second, I think that the reader is helped by knowing that the function automatically produces a vector array. Here is my example.

    x <- 107.33M # a coded number that I am in the process of decoding
    VolArray <- unlist( strsplit(x, split = ".", fixed=T))

    If the coded number is 107.33M and you do not use “fixed=T”,
    VolArray==[ “” “” “” “” “” “” “”], which is a seven-element array of null characters.
    However when you use “fixed=T”, VolArray==[ 107 33M], which is what I was seeking.

    My purpose here is not to criticize your contribution; rather it is just to make the exposition more complete on behalf of your reader. There may be a mistake in my details, and please fix it. (In fact, I was in learning mode when I studied your text; because only in this process did I learn how R does this work.)

    Reply
    • Hi Carl,

      First of all, thank you very much for the very kind words regarding my website. It’s great to hear that you find my tutorials useful! 🙂

      Also, many thanks for your contribution of additional R syntax. I think it’s great to have this piece of content in the comments, in case somebody is looking for this specific solution!

      Regards,
      Joachim

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top