Split Data Frame in R (3 Examples) | Divide (Randomly) by Row & Column

In this R tutorial you’ll learn how to separate a data frame into two different parts.

The content of the tutorial is structured as follows:

1) Creation of Example Data

2) Example 1: Splitting Data Frame by Row Using Index Positions

3) Example 2: Splitting Data Frame by Row Using Random Sampling

4) Example 3: Splitting Data Frame by Column Names

5) Video & Further Resources

Here’s how to do it:

Creation of Example Data

As a first step, let’s create some example data:

data <- data.frame(x1 = 1:10,              # Creating example data
                   x2 = letters[1:10],
                   x3 = 20:11)
data                                       # Show example data in console
#    x1 x2 x3
# 1   1  a 20
# 2   2  b 19
# 3   3  c 18
# 4   4  d 17
# 5   5  e 16
# 6   6  f 15
# 7   7  g 14
# 8   8  h 13
# 9   9  i 12
# 10 10  j 11

The previously shown RStudio console output reveals that our example data has ten rows and three columns. Let’s split these data!

Example 1: Splitting Data Frame by Row Using Index Positions

In Example 1, I’ll explain how to divide a data table into two different parts by the positions of the data rows. The first part contains the first five rows of our example data…

data_1a <- data[1:5, ]                     # Extract first five rows
data_1a                                    # Print top part of data frame
#   x1 x2 x3
# 1  1  a 20
# 2  2  b 19
# 3  3  c 18
# 4  4  d 17
# 5  5  e 16

…and the second data frame contains the bottom five rows of our input data:

data_1b <- data[6:10, ]                    # Extract last five rows
data_1b                                    # Print bottom part of data frame
#    x1 x2 x3
# 6   6  f 15
# 7   7  g 14
# 8   8  h 13
# 9   9  i 12
# 10 10  j 11

Example 2: Splitting Data Frame by Row Using Random Sampling

Example 1 has explained how to split a data frame by index positions. The following R programming code, in contrast, shows how to divide data frames randomly.

First, we have to create a random dummy as indicator to split our data into two parts:

set.seed(37645)                            # Set seed for reproducibility
dummy_sep <- rbinom(nrow(data), 1, 0.5)    # Create dummy indicator

Now, we can subset our original data based on this dummy indicator. First, we are creating one data frame…

data_2a <- data[dummy_sep == 0, ]          # Extract data where dummy == 0
data_2a                                    # Print data
#    x1 x2 x3
# 1   1  a 20
# 2   2  b 19
# 3   3  c 18
# 6   6  f 15
# 7   7  g 14
# 10 10  j 11

…and then we are creating the other data frame:

data_2b <- data[dummy_sep == 1, ]          # Extract data where dummy == 1
data_2b                                    # Print data
#   x1 x2 x3
# 4  4  d 17
# 5  5  e 16
# 8  8  h 13
# 9  9  i 12

Example 3: Splitting Data Frame by Column Names

In Example 3, I’ll illustrate how to separate data sets by column. More precisely, we are using the variable names of our data frame to split the data.

We are assigning the variables x1 and x3 to the first data frame…

data_3a <- data[ , c("x1", "x3")]          # Select specific data frame columns
data_3a                                    # Print data
#    x1 x3
# 1   1 20
# 2   2 19
# 3   3 18
# 4   4 17
# 5   5 16
# 6   6 15
# 7   7 14
# 8   8 13
# 9   9 12
# 10 10 11

…and the variable x2 to the second junk of data:

data_3b <- data[ , "x2"]                   # Select remaining column
data_3b                                    # Print data
#  [1] a b c d e f g h i j
# Levels: a b c d e f g h i j

Note that the second part of the data was converted to a vector, since we only kept a single variable in this second data part.

Video & Further Resources

If you need more explanations on the examples of this tutorial, you may watch the following video of my YouTube channel. I show the R codes of this article in the video instruction:

Furthermore, you may have a look at the other tutorials on Statistics Globe.

In this R tutorial you learned how to split a data frame into multiple subsets. Let me know in the comments below, if you have additional questions. Furthermore, please subscribe to my email newsletter for regular updates on the newest articles.

2 Comments. Leave new

chiara
May 18, 2022 7:33 am

Hi! thank you so much this was very helpful. Is there a way to do this more efficiently when you have a lot more columns that you want to split? Say maybe using a dash and saying split from clumn “x1” – “x10” ?

Reply
- Joachim
  May 18, 2022 10:15 am
  Hey Chiara,
  
  Thanks for the kind comment, glad you found it helpful!
  
  You may specify a larger number of variables using the paste0 function. For example:
  data[ , paste0("x", 1:10)]
  Regards,
  Joachim
  Reply