Summary Statistics for data.table in R (4 Examples)

 

On this page, you’ll learn how to apply summary statistics like the mean or median to the columns of a data.table in R.

The post will consist of these topics:

If you want to know more about these content blocks, keep reading!

 

Example Data & Packages

If we want to use the functions and commands of the data.table package (see our introduction here), we first have to install and load data.table:

install.packages("data.table")                             # Install data.table package
library("data.table")                                      # Load data.table package

I also have to create some example data:

set.seed(5)                                                # Set seed
dt_example <- data.table(V1 = sample(month.name[1:12], 100, replace = TRUE),  
                         V2 = sample(c(TRUE, FALSE),   100, replace = TRUE),
                         V3 = rnorm(100))                  # Create data.table
head(dt_example)                                           # Print head of data

 

table 1 data frame summary statistics for data table r

 

As you can see based on Table 1, our example data is a data.table composed of three columns.

 

Example 1: Calculate Mean Values for Groups

In this example, I’ll illustrate how to calculate the average values of certain columns.

Calculate the mean value of variable V3.

dt_example[ , mean(V3)]                                    # Mean of V3
# [1] 0.05539609
dt_example[ , mean(V3), by = V2]                           # Mean of V3, by V2

 

table 2 data frame summary statistics for data table r

 

By running the previous R code, we have created Table 2, showing the mean value of variable V3 for each unique value of variable V2.

 

Example 2: Create new Column with Summary Statistic: Mean values

In this example, I’ll demonstrate how to use summary statistics to generate a new column in data.table.

dt_example_2 <- dt_example[, "Mean" := mean(V3), by = V2]  # Create new column "Mean"
head(dt_example_2)

 

table 3 data frame summary statistics for data table r

 

In Table 3 it is shown that we have constructed a new column called Mean which contains the average values of variable V3 for the unique values of variable V2.

 

Example 3: Show Several Statistics

The following R programming syntax illustrates how to display several summary statistics at once in data.table.

dt_example[, list("mean"        = mean(V3),                # Calculate summary statistics
                  "var"         = var(V3),
                  "median"      = median(V3),
                  "min"         = min(V3),
                  "max"         = max(V3),
                  "quantile_95" = quantile(V3, 0.95))]

 

table 4 data frame summary statistics for data table r

 

The output of the previous R code is visualized in Table 4 – it contains multiple statistics of variable V3. Some basic descriptive and summary statistics are also included in the summary() function in R which can be used as shown in the code below.

dt_example[ , summary(V3), ]
#     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
# -2.62134 -0.51192  0.06732  0.05540  0.75049  2.24625

 

Example 4: Frequency Tables

Within data.table, we can also create frequency tables. The following R programming syntax illustrates how to calculate the frequency table of the two variables V1 and V2.

dt_example[, table(V1, V2)]
#          V2
# V1        FALSE TRUE
# April         5    4
# August        3    6
# December      4    6
# February      2    5
# January       5    3
# July          1    3
# June          6    4
# March         1    6
# May           3    3
# November      5    7
# October       1    8
# September     2    7

 

Video & Further Resources

Would you like to learn more about the calculation of descriptive statistics of data.table columns? Then I recommend having a look at the following video on my YouTube channel. In the video, I’m illustrating the content of this page in RStudio:

 

The YouTube video will be added soon.

 

Furthermore, you could read the other tutorials on this homepage:

 

Summary: In this tutorial, I have demonstrated how to use summary functions inside data.table in the R programming language. If you have any further questions, don’t hesitate to please let me know in the comments below.

 

Anna-Lena Wölwer Survey Statistician & R Programmer

This page was created in collaboration with Anna-Lena Wölwer. Have a look at Anna-Lena’s author page to get additional information about her academic background and the other articles she has written for Statistics Globe.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top