Split Data Frame in R (3 Examples) | Divide (Randomly) by Row & Column
In this R tutorial you’ll learn how to separate a data frame into two different parts.
The content of the tutorial is structured as follows:
Here’s how to do it:
Creation of Example Data
As a first step, let’s create some example data:
data <- data.frame(x1 = 1:10, # Creating example data x2 = letters[1:10], x3 = 20:11) data # Show example data in console # x1 x2 x3 # 1 1 a 20 # 2 2 b 19 # 3 3 c 18 # 4 4 d 17 # 5 5 e 16 # 6 6 f 15 # 7 7 g 14 # 8 8 h 13 # 9 9 i 12 # 10 10 j 11
The previously shown RStudio console output reveals that our example data has ten rows and three columns. Let’s split these data!
Example 1: Splitting Data Frame by Row Using Index Positions
In Example 1, I’ll explain how to divide a data table into two different parts by the positions of the data rows. The first part contains the first five rows of our example data…
data_1a <- data[1:5, ] # Extract first five rows data_1a # Print top part of data frame # x1 x2 x3 # 1 1 a 20 # 2 2 b 19 # 3 3 c 18 # 4 4 d 17 # 5 5 e 16
…and the second data frame contains the bottom five rows of our input data:
data_1b <- data[6:10, ] # Extract last five rows data_1b # Print bottom part of data frame # x1 x2 x3 # 6 6 f 15 # 7 7 g 14 # 8 8 h 13 # 9 9 i 12 # 10 10 j 11
Example 2: Splitting Data Frame by Row Using Random Sampling
Example 1 has explained how to split a data frame by index positions. The following R programming code, in contrast, shows how to divide data frames randomly.
First, we have to create a random dummy as indicator to split our data into two parts:
set.seed(37645) # Set seed for reproducibility dummy_sep <- rbinom(nrow(data), 1, 0.5) # Create dummy indicator
Now, we can subset our original data based on this dummy indicator. First, we are creating one data frame…
data_2a <- data[dummy_sep == 0, ] # Extract data where dummy == 0 data_2a # Print data # x1 x2 x3 # 1 1 a 20 # 2 2 b 19 # 3 3 c 18 # 6 6 f 15 # 7 7 g 14 # 10 10 j 11
…and then we are creating the other data frame:
data_2b <- data[dummy_sep == 1, ] # Extract data where dummy == 1 data_2b # Print data # x1 x2 x3 # 4 4 d 17 # 5 5 e 16 # 8 8 h 13 # 9 9 i 12
Example 3: Splitting Data Frame by Column Names
In Example 3, I’ll illustrate how to separate data sets by column. More precisely, we are using the variable names of our data frame to split the data.
We are assigning the variables x1 and x3 to the first data frame…
data_3a <- data[ , c("x1", "x3")] # Select specific data frame columns data_3a # Print data # x1 x3 # 1 1 20 # 2 2 19 # 3 3 18 # 4 4 17 # 5 5 16 # 6 6 15 # 7 7 14 # 8 8 13 # 9 9 12 # 10 10 11
…and the variable x2 to the second junk of data:
data_3b <- data[ , "x2"] # Select remaining column data_3b # Print data # [1] a b c d e f g h i j # Levels: a b c d e f g h i j
Note that the second part of the data was converted to a vector, since we only kept a single variable in this second data part.
Video & Further Resources
If you need more explanations on the examples of this tutorial, you may watch the following video of my YouTube channel. I show the R codes of this article in the video instruction:
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
Furthermore, you may have a look at the other tutorials on Statistics Globe.
- Split Data Frame into List of Data Frames Based On ID Column
- Split Data Frame Variable into Multiple Columns
- Convert Data Frame Rows to List
- split & unsplit Functions in R
- Remove Data Frame Columns by Name
- The R Programming Language
In this R tutorial you learned how to split a data frame into multiple subsets. Let me know in the comments below, if you have additional questions. Furthermore, please subscribe to my email newsletter for regular updates on the newest articles.
Statistics Globe Newsletter
2 Comments. Leave new
Hi! thank you so much this was very helpful. Is there a way to do this more efficiently when you have a lot more columns that you want to split? Say maybe using a dash and saying split from clumn “x1” – “x10” ?
Hey Chiara,
Thanks for the kind comment, glad you found it helpful!
You may specify a larger number of variables using the paste0 function. For example:
Regards,
Joachim