strsplit Function in R (3 Examples) | How to Split a Character String
In this tutorial you’ll learn how to split character strings using the strsplit() function in the R programming language.
Table of contents:
Let’s just jump right in…
Definition & Basic R Syntax of strsplit Function
Definition: The strsplit R function splits the elements of a character string.
Basic R Syntax: Please find the basic R programming syntax of the strsplit function below.
strsplit(any_string, split_pattern) # Basic R syntax of strsplit function
In the following, I’ll show three examples for the application of the strsplit function in R programming.
Creation of Exemplifying Data
As a first step, we have to construct some data that we can use in the following examples:
my_string <- "aaa bbb ccc dxxexxfxxg" # Create example character string my_string # Print character string # "aaa bbb ccc dxxexxfxxg"
Have a look at the previous output of the RStudio console. It shows that our example data is a character string containing a sequence of letters and blanks.
Example 1: Splitting Character String with strsplit() Function in R
The following syntax explains how to separate our character string at each blank position. For this, we have to specify the split argument to be equal to ” “.
strsplit(my_string, split = " ") # Apply strsplit function # [1]] # [1] "aaa" "bbb" "ccc" "dxxexxfxxg"
The previous RStudio console output shows the result of the strsplit function: A list object that contains one list element. This list element contains a vector with four vector elements. Each of these vector elements contains a character pattern extracted from our example vector.
Example 2: Using Character Pattern to Split a Character String
This Section shows how to use different character patterns to split a character string in R by specifying the split argument of the strsplit function. In this example, I’m using the character pattern “xx” to split our character string:
strsplit(my_string, split = "xx") # Specify splitting pattern # [[1]] # [1] "aaa bbb ccc d" "e" "f" "g"
Again, a list was returned. However, this time our character string was split at different points.
Example 3: Converting Output of strsplit Function to Vector Object
The following code illustrates how to convert the list output provided by the strsplit function to a vector. For this, we are using the strsplit function in combination with the unlist function:
unlist(strsplit(my_string, split = "xx")) # Convert strsplit output to vector # "aaa bbb ccc d" "e" "f" "g"
As you can see, the RStudio console returned a vector object instead of a list.
Video, Further Resources & Summary
Have a look at the following video of my YouTube channel. I show the R programming code of this article in the video.
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
Furthermore, you might have a look at some of the other articles on my website.
- Find Position of Character in String
- Split Data Frame Variable into Multiple Columns
- Replace Last Comma in Character with &-Sign
- R unlist Function
- Replace Specific Characters in String
- R Functions List (+ Examples)
- The R Programming Language
To summarize: In this R tutorial you learned how to apply the strsplit() function. In case you have further questions, let me know in the comments section. Furthermore, don’t forget to subscribe to my email newsletter in order to get updates on new posts.
Statistics Globe Newsletter
2 Comments. Leave new
As usual, your expositions on a given topic are ‘at the head of the class’ as regards good organization and clarity. Missing here are two points.
You can get into trouble when you do not use the “fixed=T” specification as illustrated below. Second, I think that the reader is helped by knowing that the function automatically produces a vector array. Here is my example.
x <- 107.33M # a coded number that I am in the process of decoding
VolArray <- unlist( strsplit(x, split = ".", fixed=T))
If the coded number is 107.33M and you do not use “fixed=T”,
VolArray==[ “” “” “” “” “” “” “”], which is a seven-element array of null characters.
However when you use “fixed=T”, VolArray==[ 107 33M], which is what I was seeking.
My purpose here is not to criticize your contribution; rather it is just to make the exposition more complete on behalf of your reader. There may be a mistake in my details, and please fix it. (In fact, I was in learning mode when I studied your text; because only in this process did I learn how R does this work.)
Hi Carl,
First of all, thank you very much for the very kind words regarding my website. It’s great to hear that you find my tutorials useful! 🙂
Also, many thanks for your contribution of additional R syntax. I think it’s great to have this piece of content in the comments, in case somebody is looking for this specific solution!
Regards,
Joachim