Include All Two-Way Interactions into Model in R (2 Examples)

 

This tutorial demonstrates how to add all two-way interaction effects to a linear model in R programming.

The focus of this tutorial is linear models, like linear regression models, generalized linear regression models, linear mixed models, etc.

Non-linear models are out of the scope of the tutorial since the interaction terms may not be simple multiplicative terms like in linear models. Instead, they can be more complex expressions involving variables and model parameters.

The table of content looks as follows:

Let’s start right away!

Example Data

The first step is to generate the data to be used in the following examples. I created three independent variables, x1, x2, and x3, and two dependent variables, y_linear, and y_binary. Then I stored them in a dataset called my_data, as shown below.

set.seed(42)
n <- 100                                                                  # Generate sample data
x1 <- rnorm(n)
x2 <- rnorm(n)
x3 <- rnorm(n)
y_linear <- 1 + 2 * x1 + 3 * x2 + x1 * x2 + rnorm(n, sd = 1)
y_binary <- ifelse(1 + 2 * x1 - 3 * x2 + 0.5 * x3 + x1 * x2 - x1 * x3 + rnorm(n, sd = 1) > 0, 1, 0)
 
my_data <- data.frame(x1, x2, x3, y_linear, y_binary)                     # Create my_data          
head(my_data)

Please be aware that the set.seed() was used to generate the same values at each run of the number generation. For further details, see Why & How to Set a Random Seed in R.

table 1 data frame include all two-way interactions model r

In Table 1, you can see that all variables have been successfully included in my_data.

 

Example 1: Set All Two-Way Interactions using ^2

One can add all two-way interactions using the multiplication operator * as usual, like x1*x2. However, it might be time- and attention-consuming if there are many variables to consider. In Example 1, I’ll illustrate how to use the exponent operator ^ as a shortcut to include all two-way interactions instead of adding all pairwise multiplications one by one. See below how it is implemented.

lm_model <- lm(y_linear ~ (x1 + x2 + x3)^2, data = my_data)               # Fit lm model
summary(lm_model)                                                         # Summary of results
 
# Call:
# lm(formula = y_linear ~ (x1 + x2 + x3)^2, data = my_data)
# 
# Residuals:
#     Min      1Q  Median      3Q     Max 
# -2.1722 -0.5290 -0.1009  0.6757  2.1412 
# 
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  1.28250    0.09634  13.312   <2e-16 ***
# x1           2.03018    0.09388  21.625   <2e-16 ***
# x2           3.08226    0.10683  28.853   <2e-16 ***
# x3           0.04513    0.11361   0.397    0.692    
# x1:x2        0.95330    0.09507  10.027   <2e-16 ***
# x1:x3       -0.11813    0.10891  -1.085    0.281    
# x2:x3        0.07627    0.11754   0.649    0.518    
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 0.9504 on 93 degrees of freedom
# Multiple R-squared:  0.9555,	Adjusted R-squared:  0.9527 
# F-statistic: 333.1 on 6 and 93 DF,  p-value: < 2.2e-16

As seen, taking the square of (x1 + x2 + x3) helps the user include all two-way interactions x1:x2, x1:x3, and x2:x3 in addition to the main effects.

 

Example 2: Set All Two-Way Interactions using .^2

In this example, I’ll show a further shortcut to include all two-way interactions in a linear model. Let’s pick another type of model this time. I opted for logistic regression, which is computed by the generalized linear model function glm(). Relatedly, the dependent variable will be y_binary instead of y_linear.

Different from Example 1, I will use . instead of the formula of (x1 + x2 + x3). Let’s see if the command includes all two-way interactions successfully!

glm_model <- glm(y_binary ~ .^2 , data = my_data, family = binomial)      # Fit glm model
summary(glm_model)                                                        # Summary of results
 
# Call:
# glm(formula = y_binary ~ .^2, family = binomial, data = my_data)
# 
# Deviance Residuals: 
#      Min        1Q    Median        3Q       Max  
# -2.48645  -0.00848   0.00513   0.12757   2.18136  
# 
# Coefficients:
#             Estimate Std. Error z value Pr(>|z|)   
# (Intercept)  0.69074    0.76331   0.905  0.36550   
# x1           3.97820    1.60293   2.482  0.01307 * 
# x2          -7.39078    2.52667  -2.925  0.00344 **
# x3           1.37089    0.82534   1.661  0.09671 . 
# y_linear     0.06587    0.57745   0.114  0.90918   
# x1:x2        1.20219    4.54634   0.264  0.79145   
# x1:x3       -2.84938    1.69833  -1.678  0.09340 . 
# x1:y_linear  0.17037    0.70335   0.242  0.80861   
# x2:x3       -1.61301    3.61481  -0.446  0.65544   
# x2:y_linear  0.30464    0.44466   0.685  0.49327   
# x3:y_linear  0.33274    0.66416   0.501  0.61638   
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# (Dispersion parameter for binomial family taken to be 1)
# 
#     Null deviance: 128.207  on 99  degrees of freedom
# Residual deviance:  30.737  on 89  degrees of freedom
# AIC: 52.737
# 
# Number of Fisher Scoring iterations: 9

As seen, the output contains some undesired two-way interactions. The reason is that when . is used to refer to all independent variables, all variables in the dataset are treated as independent variables except for the dependent variable. So to say, using . is occasionally useful.

For the sake of demonstration, we can subset our dataset and rerun the code.

my_data_new<-subset(my_data, select=c(-y_linear))                         # Subset data
 
glm_model <- glm(y_binary ~ .^2 , data = my_data_new, family = binomial)  # Fit glm model
summary(glm_model)                                                        # Summary of results
 
# Call:
# glm(formula = y_binary ~ .^2, family = binomial, data = my_data_new)
# 
# Deviance Residuals: 
#      Min        1Q    Median        3Q       Max  
# -2.00030  -0.04383   0.00134   0.10373   2.70980  
# 
# Coefficients:
#             Estimate Std. Error z value Pr(>|z|)   
# (Intercept)  1.88089    0.87514   2.149  0.03162 * 
# x1           5.02280    1.66940   3.009  0.00262 **
# x2          -7.69973    2.36412  -3.257  0.00113 **
# x3           0.03014    0.68016   0.044  0.96466   
# x1:x2        1.98879    1.03212   1.927  0.05399 . 
# x1:x3       -2.57986    1.05850  -2.437  0.01480 * 
# x2:x3        2.57605    1.61511   1.595  0.11072   
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# (Dispersion parameter for binomial family taken to be 1)
# 
#     Null deviance: 131.791  on 99  degrees of freedom
# Residual deviance:  27.009  on 93  degrees of freedom
# AIC: 41.009
# 
# Number of Fisher Scoring iterations: 9

As seen, all two-way interactions have been successfully appended to the model. You can prefer the method most convenient for your case.

 

Video, Further Resources & Summary

Do you need more information on the examples in this article? Then I recommend watching the following video on my YouTube channel. In the video tutorial, I’m explaining the R code of this tutorial in RStudio.

 

The YouTube video will be added soon.

 

In addition, you might want to have a look at the related tutorials on this homepage:

 

Summary: At this point of the tutorial, you should know how to compute all the two-way interaction effects in a linear model in R programming. In the comments section below, let me know if you have any additional questions.

 

Cansu Kebabci R Programmer & Data Scientist

This page was created in collaboration with Cansu Kebabci. Look at Cansu’s author page to get more information about her professional background, a list of all his tutorials, as well as an overview of her other tasks on Statistics Globe.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top