Summary Statistics of Data Frame in R (4 Examples)

 

This tutorial explains how to calculate summary statistics for the columns of a data frame in the R programming language.

The content of the article is structured as follows:

Let’s get started…

 

Creating Exemplifying Data

As a first step, let’s construct some example data in R:

set.seed(926436)                 # Create example data frame
data <- data.frame(x1 = rnorm(100),
                   x2 = runif(100),
                   x3 = LETTERS[1:4])
head(data)                       # Print head of example data frame

 

table 1 data frame summary statistics data frame r

 

Table 1 shows that our example data has three variables. The columns x1 and x2 contain numeric data, and the column x3 contains characters.

 

Example 1: Calculate Descriptive Statistics for Single Column of Data Frame

The syntax below explains how to compute certain descriptive statistics for one specific column of a data frame.

For instance, we can use the mean function to calculate the average of the variable x1…

mean(data$x1)                    # Calculate mean of one column
# [1] -0.05862404

…the max function to get the maximum value…

max(data$x1)                     # Calculate maximum of one column
# [1] 2.326887

…and the sum function to get the sum of all values in our data frame column:

sum(data$x1)                     # Calculate sum of one column
# [1] -5.862404

The R programming language provides many different functions for the different statistical metrics. A simple Google search quickly shows which function has to be used for which metric.

In the next section, however, I want to demonstrate how to calculate summary statistics for all columns of a data frame.

Let’s move on!

 

Example 2: Calculate Descriptive Statistics for All Columns of Data Frame

Example 2 explains how to get a certain descriptive statistic for all the variables in a data set.

For this task, we can use the apply function in combination with the function that corresponds to the metric we want to measure.

The following R code returns the maximum value for all the columns in our example data frame:

apply(data, 2, max)              # Calculate maxima of all columns
#             x1             x2             x3 
# "-3.236715045"  "0.970136215"            "D"

Note that the max function also returns the alphabetical maximum for the character column x3.

 

Example 3: Calculate Descriptive Statistics Table for All Columns of Data Frame

So far, we have always calculated a single summary statistic such as the mean, the max, or the sum.

In Example 3, I’ll demonstrate how to return multiple summary statistics for all the columns in a data frame with only one line of code.

The magical function we are looking for is the summary function.

We can simply apply this function to an entire data frame as shown in the following R code:

summary(data)                    # Calculate summary statistics table
#        x1                 x2                x3           
#  Min.   :-3.23671   Min.   :0.003054   Length:100        
#  1st Qu.:-0.67880   1st Qu.:0.257628   Class :character  
#  Median : 0.01180   Median :0.504780   Mode  :character  
#  Mean   :-0.05862   Mean   :0.501471                     
#  3rd Qu.: 0.65865   3rd Qu.:0.766102                     
#  Max.   : 2.32689   Max.   :0.970136

Have a look at the previous output of the RStudio console. It shows the minimum, 1st quartile, median, mean, 3rd quartile, and the maximum value for each of the numeric columns in our data frame. For the character column, it shows the count of cases and the class.

The summary function is very useful when you want to get a quick overview on the structure of your data.

 

Example 4: Calculate Descriptive Statistics by Group

In the previous examples, we have calculated certain summary statistics for entire data frame columns.

However, often it is required to evaluate particular groups in a data frame.

For such a situation, we can use the aggregate function.

Within the aggregate function, we have to specify the variable that we want to evaluate (i.e. x1), the grouping variable (i.e. x3), the name of the data frame (i.e. data), as well as the metric we would like to print (i.e. the variance based on the var function).

aggregate(x1 ~ x3, data, var)    # Calculate variance by group

 

table 2 data frame summary statistics data frame r

 

As shown in Table 2, we have created a data matrix showing the variance of each x1 subgroup in our input data frame.

 

Video & Further Resources

Do you need more info on the R codes of this article? Then I recommend having a look at the following video on my YouTube channel. In the video, I explain the R codes of this article.

 

The YouTube video will be added soon.

 

In addition, you might want to read the related articles on my website. You can find some interesting tutorials below.

 

In this tutorial, I have shown how to get descriptive statistics for the columns of a data frame in R. In case you have additional comments or questions, don’t hesitate to tell me about it in the comments.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published.

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top