Run Multiple Regression Models in for-Loop in R (Example)

In this article, I’ll show how to estimate multiple regression models in a for-loop in the R programming language.

Table of contents:

1) Introducing Example Data

2) Example: Running Multiple Linear Regression Models in for-Loop

3) Video, Further Resources & Summary

4) Subscribe to the Statistics Globe Newsletter

5) Thank you!

If you want to know more about these topics, keep reading…

Introducing Example Data

The following data is used as basement for this R programming tutorial:

set.seed(98274)                          # Creating example data
y <- rnorm(1000)
x1 <- rnorm(1000) + 0.2 * y
x2 <- rnorm(1000) + 0.2 * x1 + 0.1 * y
x3 <- rnorm(1000) - 0.1 * x1 + 0.3 * x2 - 0.3 * y
data <- data.frame(y, x1, x2, x3)
head(data)                               # Head of data
#            y         x1         x2          x3
# 1  0.5587036 -0.3779533 -0.5320515 -0.92069263
# 2  0.8422515 -1.3835572  1.2782521  0.87967960
# 3 -0.5395343 -0.9729798 -0.1515273 -0.05973894
# 4 -0.3522260  1.2977564 -0.3512013 -0.77239810
# 5  1.5848675 -1.3152806 -2.3644414 -1.14651812
# 6  0.2207957  1.8860636  0.1967851 -0.04963894

As you can see based on the previous RStudio console output, our example data consists of four numeric columns. The first variable is our regression outcome and the three other variables are our predictors.

Example: Running Multiple Linear Regression Models in for-Loop

In this Example, I’ll show how to run three regression models within a for-loop in R. In each for-loop iteration, we are increasing the complexity of our model by adding another predictor variable to the model.

First, we have to create a list in which we will store the outputs of our for-loop iterations:

mod_summaries <- list()                  # Create empty list

Now, we can write a for-loop that runs multiple linear regression models as shown below:

for(i in 2:ncol(data)) {                 # Head of for-loop
 
  predictors_i <- colnames(data)[2:i]    # Create vector of predictor names
  mod_summaries[[i - 1]] <- summary(     # Store regression model summary in list
    lm(y ~ ., data[ , c("y", predictors_i)]))
 
}

Let’s have a look at the output of our previously executed for-loop:

mod_summaries                            # Return summaries of all models

multiple linear regressions executed in a for-loop in r

As you can see in Figure 1, we have created a list containing three different summary statistics of three different linear regressions.

Video, Further Resources & Summary

If you need further explanations on the content of this tutorial, I can recommend having a look at the following video that I have published on my YouTube channel.

In the video, I’m explaining how top loop and repeat the estimation of multiple regression models using the R programming syntax of this tutorial in RStudio.

The YouTube video will be added soon.

In addition, you may want to read the other R programming tutorials of my homepage.

Summary: At this point you should know how to write a for-loop executing several linear regressions in R programming. Please let me know in the comments below, in case you have further questions. Furthermore, please subscribe to my email newsletter to receive updates on new articles.

34 Comments. Leave new

Fernando Ramires
March 1, 2021 1:46 pm

the procedure can follow the stepwise line….

Reply
- Joachim
  March 1, 2021 2:31 pm
  
  Hey Fernando,
  
  Could you explain what you mean? I’m afraid I don’t get it.
  
  Thanks!
  
  Joachim
  
  Reply
Farieda Hassanen
July 24, 2021 5:22 pm

Very good

Reply
- Joachim
  August 2, 2021 10:03 am
  
  Thanks a lot Farieda, glad you like it! 🙂
  
  Reply
Anu
August 7, 2021 8:07 am

Thanks for the excellent explanation. If you can provide steps on how to use PCA (principal component analysis) on series of images, that would be great!

Reply
- Joachim
  August 9, 2021 6:17 am
  
  Hey Anu,
  
  Thanks a lot for the nice feedback, and for the topic suggestion! I’ve noted it on my to-do list.
  
  Regards
  
  Joachim
  
  Reply
Helen
August 28, 2021 9:14 pm

How can I modify this to have models for each variable by itself? For example I want the the second model to just have x2 and the third model to just have x3 so on, rather than them building onto each other. Also could I change this to do and OddsRatio with each predictor?

Reply
- Joachim
  August 30, 2021 5:52 am
  Hey Helen,
  
  To change the predictor variables as you want, you only have to change
  predictors_i <- colnames(data)[2:i]
  to
  predictors_i <- colnames(data)[i]
  (i.e. remove the “2:”).
  
  I don’t have a tutorial on Odds Ratios yet. However, I think this article explains it nicely: https://jarrettmeyer.com/2019/07/23/odds-ratio-in-r
  
  Regards
  
  Joachim
  Reply
Hussein Ali
October 25, 2021 8:22 pm

Hi Mr Joachim
I need your help because till now no one answered my question and I hope to find the solution with you,

How can find the range (minimum, maximum) values for Intercept, slope and R-square in r programming?

Best Regards
Hussein Ali

Reply
- Joachim
  October 26, 2021 5:46 am
  
  Hey Hussein,
  
  Could you specify what you mean with range values?
  
  The following tutorial shows how to extract R-squared values from a regression model: https://statisticsglobe.com/r-extract-multiple-adjusted-r-squared-from-linear-regression-model
  
  Regards
  
  Joachim
  
  Reply
  - Hussein Ali
    November 8, 2021 12:23 pm
    
    I mean range for Intercept, Slope and R-square, I have paper (article) they found range (minimum, maximum) for intercept, slope and r-square.
    can you send to me your email to show you the paper?
    
    Reply
    - Joachim
      November 8, 2021 1:22 pm
      
      I’m not an expert on this topic. However, I have recently created a Facebook discussion group where people can ask questions about R programming and statistics. Could you post your question there? This way, others can contribute/read as well: https://www.facebook.com/groups/statisticsglobe
      
      Reply
Simon von Zahn
October 28, 2021 7:31 pm

Hello Joachim,
will this example provide me knowing which of the tested models to use for a maximised adj. R squared?
And which packages are required to run it?
Best Regards
Simon

Reply
- Joachim
  October 29, 2021 5:39 am
  
  Hey Simon,
  
  You may combine the for-loop with the code shown in this tutorial: https://statisticsglobe.com/r-extract-multiple-adjusted-r-squared-from-linear-regression-model
  
  This way, you could obtain a vector of R squared after running the for-loop.
  
  You don’t have to load any additional packages for this.
  
  Regards,
  Joachim
  
  Reply
Alexandre
April 14, 2022 8:42 pm

Hi Joachim,

I’m trying to use this method to run regressions on “n” independent variables, so up to every combination available from my columns. Is there a way I can do this with this code? Thanks for your help!

Reply
- Joachim
  April 18, 2022 8:49 am
  
  Hey Alexandre,
  
  This is definitely possible. I would find all combinations of your columns as explained here, and then I would convert each of the combinations to a formula as explained in Example 4 here.
  
  I don’t know if there is a more efficient way, though.
  
  Let me know in case you need further help.
  
  Regards,
  Joachim
  
  Reply
Simone
May 10, 2022 2:14 pm

Hi Joachim,
thank you for sharing this guide. I run several linear regression models with it. How can I extract all p-values and estimates of the list of models I created? I would like to store them in one dataframe.

Reply
- Joachim
  May 10, 2022 2:44 pm
  
  Hey Simone,
  
  Glad you find the tutorial helpful!
  
  You may extract p-values from regression models as explained here, and you may extract regression coefficients as explained here.
  
  Regards,
  Joachim
  
  Reply
Foyez
September 2, 2022 11:32 am

How can we extract the predicted value from this linear models that stored in the list

Reply
- Joachim
  September 5, 2022 10:52 am
  
  Hey Foyez,
  
  Please have a look at this tutorial for more info on this.
  
  Regards,
  Joachim
  
  Reply
Sabrina
September 14, 2022 2:12 pm

Hii Joachim,

Thanks for the nice example. How can I expand this code to extract the coeff, SD and P-value and put the in a separate list?

Reply
- Joachim
  September 19, 2022 11:15 am
  
  Hi Sabrina,
  
  Thank you very much for the kind comment!
  
  I just returned from holidays, so unfortunately, I couldn’t respond to your question earlier. Are you still looking for help?
  
  Regards,
  Joachim
  
  Reply
  - Sabrina
    September 19, 2022 6:08 pm
    
    Hii Joachim,
    
    Yes, I am still looking for help.
    
    Reply
    - Joachim
      September 20, 2022 6:37 am
      
      Hey Sabrina,
      
      Did you already have a look at this tutorial? It explains how to extract certain values from a regression model. You may combine this code with the code of the present tutorial.
      
      Regards,
      Joachim
      
      Reply

Sal

September 24, 2022 2:26 pm

Hey Joachim,
Can I modify the code and use for multiple ‘target variables’ instead of multiple ‘predictors’? If so, could you examplify the code.
Thanks

Joachim

September 30, 2022 5:06 pm

Hey Sal,

Yes, this is possible. Please have a look at the modified example code below:

set.seed(98274)                          # Creating example data
y <- rnorm(1000)
x1 <- rnorm(1000) + 0.2 * y
x2 <- rnorm(1000) + 0.2 * x1 + 0.1 * y
x3 <- rnorm(1000) - 0.1 * x1 + 0.3 * x2 - 0.3 * y
data <- data.frame(y1 = y,
                   y2 = y + rnorm(1000),
                   y3 = y + 10 * rnorm(1000),
                   x1, x2, x3)
head(data)                               # Head of data
#           y1          y2         y3         x1         x2          x3
# 1  0.5587036  1.53788165  0.1669674 -0.3779533 -0.5320515 -0.92069263
# 2  0.8422515  0.08528937  0.6847646 -1.3835572  1.2782521  0.87967960
# 3 -0.5395343 -0.98086394 -1.2222470 -0.9729798 -0.1515273 -0.05973894
# 4 -0.3522260  0.30012623 -1.7663313  1.2977564 -0.3512013 -0.77239810
# 5  1.5848675 -0.48901240 -1.3411279 -1.3152806 -2.3644414 -1.14651812
# 6  0.2207957  0.11100091 -0.2485206  1.8860636  0.1967851 -0.04963894
 
mod_summaries <- list()                  # Create empty list
 
out_names <- c("y1", "y2", "y3")         # Names of all target variables
 
for(i in 1:length(out_names)) {          # Estimate models
 
  mod_summaries[[i]] <- summary(lm(get(out_names[i]) ~ x1 + x2 + x3, data))
}
 
mod_summaries                            # Return summaries of all models
# [[1]]
# 
# Call:
# lm(formula = get(out_names[i]) ~ x1 + x2 + x3, data = data)
# 
# Residuals:
#      Min       1Q   Median       3Q      Max 
# -2.67110 -0.65829 -0.03136  0.66088  3.15418 
# 
# Coefficients:
#              Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  0.003017   0.030502   0.099    0.921    
# x1           0.131145   0.031817   4.122 4.07e-05 ***
# x2           0.173592   0.031297   5.547 3.73e-08 ***
# x3          -0.211426   0.029224  -7.235 9.31e-13 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 0.9631 on 996 degrees of freedom
# Multiple R-squared:  0.09867,	Adjusted R-squared:  0.09596 
# F-statistic: 36.35 on 3 and 996 DF,  p-value: < 2.2e-16
# 
# 
# [[2]]
# 
# Call:
# lm(formula = get(out_names[i]) ~ x1 + x2 + x3, data = data)
# 
# Residuals:
#     Min      1Q  Median      3Q     Max 
# -4.3833 -0.9343  0.0150  0.8776  5.3319 
# 
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept) -0.03054    0.04400  -0.694  0.48784    
# x1           0.11453    0.04590   2.495  0.01274 *  
# x2           0.13429    0.04515   2.974  0.00301 ** 
# x3          -0.21587    0.04216  -5.121 3.66e-07 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 1.389 on 996 degrees of freedom
# Multiple R-squared:  0.04237,	Adjusted R-squared:  0.03948 
# F-statistic: 14.69 on 3 and 996 DF,  p-value: 2.301e-09
# 
# 
# [[3]]
# 
# Call:
# lm(formula = get(out_names[i]) ~ x1 + x2 + x3, data = data)
# 
# Residuals:
#     Min      1Q  Median      3Q     Max 
# -34.264  -6.389  -0.109   6.519  34.821 
# 
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)   
# (Intercept) -0.44531    0.31494  -1.414  0.15769   
# x1           0.33912    0.32852   1.032  0.30220   
# x2           0.85905    0.32315   2.658  0.00798 **
# x3          -0.09715    0.30175  -0.322  0.74755   
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 9.944 on 996 degrees of freedom
# Multiple R-squared:  0.0105,	Adjusted R-squared:  0.007517 
# F-statistic: 3.522 on 3 and 996 DF,  p-value: 0.01465

I hope this helps!

Joachim

Emily
November 19, 2022 10:08 pm

The last example is really useful for running multiple regressions in a for loop. What if I wanted to extract the regression coefficients for each i in out_names (Y1, Y2, Y3), and then transform those? For example, for each target variable (Y1, Y2, Y3) if I were interested in understanding the fraction of X1*x/Yi (such that X1*x/Yi +X2*x/Yi+X3*x/Yi = 1). Could I use a loop to extract the regression coefficient and store it so I can use it in a new function?

Reply
- Joachim
  November 21, 2022 12:31 pm
  
  Hey Emily,
  
  Thank you for the kind comment, glad you find the tutorial helpful!
  
  Regarding your question, please have a look here. This tutorial demonstrates how to extract regression coefficients from a model. The code could be integrated in your loop.
  
  Regards,
  Joachim
  
  Reply

Nate C

January 16, 2023 4:29 pm

Hi there, I adapted your code (thank you) for 1115 variables. Now I have a massive list with all of the summaries. Do you have recommendations on how I could easily sort through or extract only p-values or adjusted R squared values? Thanks!

Cansu (Statistics Globe)

January 17, 2023 9:49 am

Hello Nate,

Glad that we helped. To extract the adjusted Rsquares and the p-values only. You can use the following code.

For extracting the adjusted R squares:

mod_summaries_adj_rsqr <- list()                  # Create empty list
 
for(i in 2:ncol(data)) {                 # Head of for-loop
 
  predictors_i <- colnames(data)[2:i]    # Create vector of predictor names
  mod_summaries_adj_rsqr [[i - 1]] <- summary(     # Store regression model adj_rsqr summary in list
    lm(y ~ ., data[ , c("y", predictors_i)]))$adj.r.squared
 
}
 
mod_summaries_adj_rsqr                           # Return adj_rsqr summaries of all models
 
# [[1]]
# [1] 0.03510536
# 
# [[2]]
# [1] 0.04940452
# 
# [[3]]
# [1] 0.09595714

For extracting the p-values:

mod_summaries_pvalues <- list()                  # Create empty list
 
for(i in 2:ncol(data)) {                 # Head of for-loop
 
  predictors_i <- colnames(data)[2:i]    # Create vector of predictor names
  mod_summaries_pvalues[[i - 1]] <- summary(     # Store regression model p-values summary in list
    lm(y ~ ., data[ , c("y", predictors_i)]))$coefficients[,4]
 
}
 
mod_summaries_pvalues                            # Return  p-values summaries of all models
 
# [[1]]
# (Intercept)           x1 
# 8.008000e-01 1.416432e-09 
# 
# [[2]]
# (Intercept)           x1           x2 
# 7.899250e-01 1.364254e-06 6.759303e-05 
# 
# [[3]]
# (Intercept)           x1           x2           x3 
# 9.212350e-01 4.071976e-05 3.732345e-08 9.307473e-13

Hope it helps!

Regards,
Cansu

Fatme Ghaddar
July 20, 2023 7:39 am

Hello Joachim,

Could you please help me to solve this coding problem:
If I have 7 samples and I want to analyze the linear mixed model (using the LMER function) to get the variation of the antibodies logarithm according to 4 different runs.
The general code should be as follows:
Model<- lmer(log10(values)~sample + (1|Run) , data = Elisa, REML = F)

Concerning the random effect formula (1|Run): I want to get the variances for the Run variable and also the variance of the residuals for each sample (=7), how can I do this according to your explanation above.
Should I separate the sample variable into dummy variables? Or keep the sample variable as a single variable and use the loop function?
I've tried adding the Sample variable to the random effects formula like this: ((Sample|Run), but it only gives me the variances of the run without the variances of the residuals for each sample.

Thank you in advance for your help and cooperation.
And if you can write me the adjustable code, I'd be grateful.

Have a nice day.

Reply
- Cansu (Statistics Globe)
  July 20, 2023 7:51 am
  
  Hello Fatme,
  
  It has been a while since I worked with mixed models. I don’t want to guide you wrong. Besides, I am not familiar with your model and research question. I suggest you post your question on our Facebook page: https://www.facebook.com/groups/statisticsglobe with your research question. Someone with more fresh knowledge could help you there.
  
  Best,
  Cansu
  
  Reply
Paul
May 17, 2024 1:33 am

Hello Joachim,
Could you please prepare a for-loop tutorial that runs a linear model for multiple “countries/firms” and gives results for each country separately.
Thanks

Reply
- Joachim (Statistics Globe)
  May 27, 2024 11:47 am
  
  Hey Paul,
  
  Thank you for the kind topic suggestion, I’ll keep it in mind.
  
  Regards,
  Joachim
  
  Reply