Sort Data Frame by Multiple Columns in R (3 Examples)

 

This tutorial illustrates how to order a data frame by multiple columns in the R programming language.

I will show you three example codes for the sorting of data frames. More precisely, tutorial contains the following topics:

Let’s get started…

 

Creation of Example Data

In this tutorial, we’ll use the following example data frame:

data <- data.frame(x1 = 1:5,                         # Create example data
                   x2 = c("A", "D", "A", "B", "d"),
                   x3 = c(10, 5, 1, 20, 5))
data                                                 # Print example data

 

example data frame

Table 1: Example Data Frame.

 

Our data contains three columns (i.e. x1, x2 & x3) and each of these column vectors contains five values.

Our data frame variables have the classes integer, character, and numeric. However, please note that we could use the following R codes to sort other data types such as factor and date columns as well.

In the following examples, we will use the variables x2 and x3 as sorting variables.

 

Example 1: Sort Data Frame by Multiple Columns with Base R (order Function)

In the first example, we’ll sort our data frame based on the order() and the with() functions. The two functions are already available in Base R:

data[with(data, order(x2, x3)), ]                    # Order data with Base R

 

sorted data frame in R

Table 2: Ordered Data Frame.

 

Table 2 illustrates the resulting data frame that is printed to the RStudio console. As you can see, the data matrix is sorted according to the second and the third column (i.e. x2 and x3).

The sorting column x2 is hierarchically more important than the column x3, since we specified it first within the order function. In other words, the data is sorted according to the values of x2 first, and then it is sorted within each group of x2 according to the column x3.

You can see this based on the first two rows of Table 2. Both the first and the second row contain the value A in column x2. For that reason, these rows are ordered according to x3.

That’s basically how to sort a data frame by column in R. However, depending on your personal preferences you might prefer to rearrange your data based on a different R code. I’m therefore going to show you several programming alternatives for the ordering of your data sets in the following examples.

 

Example 2: Sort Data Frame by Multiple Columns with dplyr Package (arrange Function)

Probably the most popular alternative to Base R is the dplyr package of the tidyverse. Let’s install and load the dplyr package in RStudio:

install.packages("dplyr")                            # Install dplyr package
library("dplyr")                                     # Load dplyr package

We can now use the arrange command of the dplyr package to order our data:

arrange(data, x2, x3)                                # Order data with dplyr

The output of this code is the same as in Example 1.

Note that we specified the columns x2 and x3 as sorting columns. However, you could specify as many columns within the arrange function as you want.

 

Example 3: Sort Data Frame by Multiple Columns with data.table Package (setorder Function)

The next alternative I’m going to show you is the setorder function of the data.table package:

install.packages("data.table")                       # Install data.table package
library("data.table")                                # Load data.table package

After installing and loading the package, we can apply the setorder function as follows:

data_ordered <- data                                 # Replicate example data
setorder(data_ordered, x2, x3)                       # Order data with data.table
data_ordered                                         # Print ordered data

Again, the same output.

 

Video & Further Resources

In case you need some further explanations for the ordering of data frames by several column vectors, you could have a look at the following video that I have published on my YouTube channel. In the video, I’m describing the R syntax of this page in more detail:

 

 

Furthermore, you might have a look at the other articles of this homepage. You can find some related tutorials below:

At this point of the tutorial you should know how to rearrange the rows of a data.frame based on multiple columns in the R programming language. If you have any further questions, however, please let me know in the comments section below.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


2 Comments. Leave new

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top