Example Data & Packages

If we want to use the functions and commands of the data.table package (see our introduction here), we first have to install and load data.table:

install.packages("data.table")                             # Install data.table package
library("data.table")                                      # Load data.table package

I also have to create some example data:

set.seed(5)                                                # Set seed
dt_example <- data.table(V1 = sample([1:12], 100, replace = TRUE),  
                         V2 = sample(c(TRUE, FALSE),   100, replace = TRUE),
                         V3 = rnorm(100))                  # Create data.table
head(dt_example)                                           # Print head of data


table 1 data frame summary statistics for data table r


As you can see based on Table 1, our example data is a data.table composed of three columns.


Example 1: Calculate Mean Values for Groups

In this example, I’ll illustrate how to calculate the average values of certain columns.

Calculate the mean value of variable V3.

dt_example[ , mean(V3)]                                    # Mean of V3
# [1] 0.05539609
dt_example[ , mean(V3), by = V2]                           # Mean of V3, by V2


table 2 data frame summary statistics for data table r


By running the previous R code, we have created Table 2, showing the mean value of variable V3 for each unique value of variable V2.


Example 2: Create new Column with Summary Statistic: Mean values

In this example, I’ll demonstrate how to use summary statistics to generate a new column in data.table.

dt_example_2 <- dt_example[, "Mean" := mean(V3), by = V2]  # Create new column "Mean"


table 3 data frame summary statistics for data table r


In Table 3 it is shown that we have constructed a new column called Mean which contains the average values of variable V3 for the unique values of variable V2.


Example 3: Show Several Statistics

The following R programming syntax illustrates how to display several summary statistics at once in data.table.

dt_example[, list("mean"        = mean(V3),                # Calculate summary statistics
                  "var"         = var(V3),
                  "median"      = median(V3),
                  "min"         = min(V3),
                  "max"         = max(V3),
                  "quantile_95" = quantile(V3, 0.95))]


table 4 data frame summary statistics for data table r


The output of the previous R code is visualized in Table 4 – it contains multiple statistics of variable V3. Some basic descriptive and summary statistics are also included in the summary() function in R which can be used as shown in the code below.

dt_example[ , summary(V3), ]
#     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
# -2.62134 -0.51192  0.06732  0.05540  0.75049  2.24625


Example 4: Frequency Tables

Within data.table, we can also create frequency tables. The following R programming syntax illustrates how to calculate the frequency table of the two variables V1 and V2.

dt_example[, table(V1, V2)]
#          V2
# V1        FALSE TRUE
# April         5    4
# August        3    6
# December      4    6
# February      2    5
# January       5    3
# July          1    3
# June          6    4
# March         1    6
# May           3    3
# November      5    7
# October       1    8
# September     2    7


Summary: In this tutorial, I have demonstrated how to use summary functions inside data.table in the R programming language. If you have any further questions, don’t hesitate to please let me know in the comments below.


Anna-Lena Wölwer Survey Statistician & R Programmer

This page was created in collaboration with Anna-Lena Wölwer. Have a look at Anna-Lena’s author page to get additional information about her academic background and the other articles she has written for Statistics Globe.


