Split Comma-Separated Character Strings in Column into Separate Rows in R (2 Examples)
In this article, I’ll demonstrate how to split comma-separated character strings in columns into separate rows in R programming.
The post will consist of two examples for the splitting of comma-separated character strings in columns into separate rows. More precisely, the tutorial will consist of this:
It’s time to dive into the examples:
Example Data & Software Packages
First of all, we’ll have to create some example data:
data<-data.frame(var1=1:5, # Creating the data var2=c("a,b,c", "a,c", "abc", "c", "k,p,h,t"), var3=c(":,-,+", ":,:.", "**:", "????", "!,!,?,:")) data # Print example data
Table 1 illustrates the output of the RStudio console returned by the previous R code and shows that our example data contains five rows and three columns. The column var1 is an integer and the variables var2 and var3 have the character class.
For the examples of this tutorial, we also need to install and load the tidyr package:
install.packages("tidyr") # Install tidyr package library("tidyr") # Load tidyr
Example 1: Divide Comma-Separated Character String Using separate_rows() Function
In this example, I’ll illustrate how to split character strings containing a comma into multiple lines. For this task, we can use the separate_rows() function of the tidyr package as shown below:
new_data1<-data %>% # Application of separate_rows() function separate_rows(var2, sep=",") new_data1 # Print updated data
In Table 2 it is shown that we have created a new tibble by executing the previous R code. This tibble contains a separate row for each comma in the character strings of the column var2.
Example 2: Divide Multiple Columns with Comma-Separated Character String Using separate_rows() Function
In Example 1, I have shown how to split the characters in only one data frame column. In this example, I’ll show how to split the characters in multiple columns of our data set.
To achieve this, we can simply add further variable names within the separate_rows function. However, please make sure that the number of commas in each row of the different string variables is equal.
Consider the R syntax below:
new_data2<-data %>% # Apply separate_rows to multiple variables separate_rows(var2, var3, sep=",") new_data2 # Print updated data
As shown in Table 3, the previously executed code has created another tibble, where we have split all commas in the columns var2 and var3.
Note that in the example of this tutorial, we have created tibble outputs. In case you would prefer to continue working with data frames using the as.data.frame() function.
Video, Further Resources & Summary
Have a look at the following video on the Statistics Globe YouTube channel. We are demonstrating the R codes of this article in the video.
The YouTube video will be added soon.
Please note: There are other alternatives on how to split comma-separated characters into new rows using the strsplit() and unnest() functions, or using the data.table package. Please have a look at this thread on Stack Overflow for more details.
In addition, you might have a look at the related tutorials on my website:
- Split Character String into Letters & Numbers
- Split Character String into Chunks
- Split Data Frame into List of Data Frames Based On ID Column
- How to Split a Date-Time Column into Separate Variables
- Split Data Frame Variable into Multiple Columns
- Introduction to R
In this R article you have learned how to divide comma-separated character strings in data frame variables into separate rows. If you have additional questions, tell me about it in the comments section.
This page was created in collaboration with Cansu Kebabci. Have a look at Cansu’s author page to get more information about her professional background, a list of all his tutorials, as well as an overview on her other tasks on Statistics Globe.
Statistics Globe Newsletter