Split Character String into Chunks in R (2 Examples)
In this tutorial you’ll learn how to cut a character string into multiple pieces in the R programming language.
The page contains these contents:
Let’s get started!
Creation of Example Data
The first step is to create some example data:
my_string <- "AAABBBCCC" # Create example character string my_string # Print example character string # [1] "AAABBBCCC"
Have a look at the previously shown output of the RStudio console. It shows that our example data is a character string containing nine upper case letters.
Next, we also have to specify the size of each chunk of characters. In the examples of this tutorial, we’ll chop our data into chunks with an equal size of two.
n <- 2 # Define length of character chunks
Now, we are set up and can move on to the example codes.
Example 1: Divide Character String into Chunks Using substring() Function
In Example 1, I’ll illustrate how to use the substring, seq, and nchar functions to split our character string object into different parts.
Have a look at the following R syntax:
substring(my_string, # Apply substring function seq(1, nchar(my_string), n), seq(n, nchar(my_string), n)) # [1] "AA" "AB" "BB" "CC" ""
As you can see in the RStudio console, we have created a character vector containing five chunks of our input data.
Note that the last element of this vector is empty, because our input character string consists of an odd number of characters. In such a case, the previous R code keeps the last vector element empty.
Example 2: Divide Character String into Chunks Using strsplit() Function
In this example, I’ll explain how to use the strsplit, gsub, and paste0 functions to chop our character string into pieces:
strsplit(gsub(paste0("([[:alnum:]]{", # Apply strsplit function n, "})"), "\\1 ", my_string), " ")[[1]] # [1] "AA" "AB" "BB" "CC" "C"
The RStudio console output shows the difference of this example compared to the previous example: The last element of our output vector consists of only one character, i.e. the remaining character due to the odd length of our input character string.
Video, Further Resources & Summary
Would you like to learn more about character string manipulation? Then I recommend watching the following video of my YouTube channel.
In the video, I’m explaining how to create fixed width character elements based on an input character string using the R programming codes of this tutorial in RStudio:
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
In addition, you may have a look at the other posts of www.statisticsglobe.com. I have published numerous articles on related topics such as variables and vectors:
- strsplit Function in R
- Split Vector into Chunks in R
- Split Data Frame Variable into Multiple Columns
- R Programming Overview
Summary: You have learned in this tutorial how to divide a character string into chunks with a given length in the R programming language. Let me know in the comments, if you have further questions.
Statistics Globe Newsletter