Summarize Multiple Columns of data.table by Group in R (Example)
In this R tutorial you’ll learn how to group the variables of a data.table.
The tutorial will contain these contents:
Here’s how to do it…
Example Data & Packages
At the start, we’ll have to construct some exemplifying data with the data.table class:
To be able to use the functions of the data.table package, we have to install and load data.table first:
install.packages("data.table") # Install data.table package library("data.table") # Load data.table
Next, we can create a data.table using the data.table() function as shown below:
data <- data.table(x1 = 1:12, # Create example data.table x2 = 11:22, group = rep(letters[1:3], each = 4)) data # Print example data.table
Table 1 shows that the exemplifying data.table contains twelve rows and three columns.
Example: Group Multiple Variables Using data.table Package
In this example, I’ll explain how to summarize multiple columns of a data.table by group to create descriptive statistics of our data.
For this, we have to use lapply and .SD as shown below.
Note that we are computing the mean of each group with the following R syntax. However, we could replace the mean function by other functions such as sum, median, or quantile as well.
data_group <- data[, lapply(.SD, mean), by = group] # Summarize by group data_group # Print summarized data.table
As shown in Table 2, the previous code has created a data.table showing the mean in each variable of each group.
Video, Further Resources & Summary
Would you like to know more about data.tables in R? Then you may want to have a look at the following video of my YouTube channel. I show the examples of this tutorial in the video.
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
Furthermore, you may have a look at the other tutorials on my website. You can find some other tutorials about topics such as variables and dplyr below.
- Sort Data Frame by Multiple Columns in R
- Split Data Frame Variable into Multiple Columns
- Drop Multiple Columns from Data Frame Using dplyr Package
- R Programming Language
To summarize: This page has illustrated how to summarize the variables of a data.table by groups in the R programming language. In case you have any additional questions, let me know in the comments section below.
Statistics Globe Newsletter