Split Character String into Chunks in R (2 Examples)

 

In this tutorial you’ll learn how to cut a character string into multiple pieces in the R programming language.

The page contains these contents:

Let’s get started!

 

Creation of Example Data

The first step is to create some example data:

my_string <- "AAABBBCCC"                 # Create example character string
my_string                                # Print example character string
# [1] "AAABBBCCC"

Have a look at the previously shown output of the RStudio console. It shows that our example data is a character string containing nine upper case letters.

Next, we also have to specify the size of each chunk of characters. In the examples of this tutorial, we’ll chop our data into chunks with an equal size of two.

n <- 2                                   # Define length of character chunks

Now, we are set up and can move on to the example codes.

 

Example 1: Divide Character String into Chunks Using substring() Function

In Example 1, I’ll illustrate how to use the substring, seq, and nchar functions to split our character string object into different parts.

Have a look at the following R syntax:

substring(my_string,                     # Apply substring function
          seq(1, nchar(my_string), n),
          seq(n, nchar(my_string), n))
# [1] "AA" "AB" "BB" "CC" ""

As you can see in the RStudio console, we have created a character vector containing five chunks of our input data.

Note that the last element of this vector is empty, because our input character string consists of an odd number of characters. In such a case, the previous R code keeps the last vector element empty.

 

Example 2: Divide Character String into Chunks Using strsplit() Function

In this example, I’ll explain how to use the strsplit, gsub, and paste0 functions to chop our character string into pieces:

strsplit(gsub(paste0("([[:alnum:]]{",    # Apply strsplit function
                     n,
                     "})"),
              "\\1 ",
              my_string),
         " ")[[1]]
# [1] "AA" "AB" "BB" "CC" "C"

The RStudio console output shows the difference of this example compared to the previous example: The last element of our output vector consists of only one character, i.e. the remaining character due to the odd length of our input character string.

 

Video, Further Resources & Summary

Would you like to learn more about character string manipulation? Then I recommend watching the following video of my YouTube channel.

In the video, I’m explaining how to create fixed width character elements based on an input character string using the R programming codes of this tutorial in RStudio:

 

The YouTube video will be added soon.

 

In addition, you may have a look at the other posts of www.statisticsglobe.com. I have published numerous articles on related topics such as variables and vectors:

 

Summary: You have learned in this tutorial how to divide a character string into chunks in the R programming language. Let me know in the comments, if you have further questions.

 

Subscribe to my free statistics newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top