# Extract Beta Coefficients from Linear Regression Model in R (Example)

In this tutorial, I’ll illustrate how to get standardized regression coefficients (also called beta coefficients or beta weights) from a linear model in R.

The article contains this:

Let’s get started…

## Introduction of Example Data

Consider the following example data:

```set.seed(2344637)                   # Create example data
x1 <- rnorm(100)
x2 <- rnorm(100) + 0.25 * x1
x3 <- rnorm(100) + 0.5 * x1 - 0.3 * x2
y <- rnorm(100) + 0.15 * x1 + 0.4 * x2 - 0.1 * x3
data <- data.frame(x1, x2, x3, y) Table 1 shows that our example data contains four columns. The variables x1-x3 will be used as predictors and the variable y as target variable.

Next, we can estimate a linear regression model based on our data using the lm function:

```my_mod <- lm(y ~ ., data)           # Estimate linear regression model
summary(my_mod)                  # Print summary statistics
# Call:
# lm(formula = y ~ ., data = data)
#
# Residuals:
#      Min       1Q   Median       3Q      Max
# -2.52046 -0.72756  0.04412  0.73519  2.82633
#
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)  0.27485    0.10599   2.593   0.0110 *
# x1           0.13312    0.12173   1.094   0.2769
# x2           0.22596    0.10180   2.220   0.0288 *
# x3          -0.08231    0.11531  -0.714   0.4771
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 1.049 on 96 degrees of freedom
# Multiple R-squared:  0.06315,	Adjusted R-squared:  0.03388
# F-statistic: 2.157 on 3 and 96 DF,  p-value: 0.09811```

The previous output shows the summary statistics for our regression model.

However, this output does not show the beta coefficients. This is what we are going to compute next!

Note: the following two examples might produce slightly different results in case your data contains NA values. Please have a look at the comments of Thomas J. below this tutorial to read more on this topic.

## Example 1: Extract Standardized Coefficients from Linear Regression Model Using Base R

In this example, I’ll explain how to calculate beta weights based on a linear regression model using the basic installation of the R programming language.

More precisely, we are using the lm, data.frame, and scale functions.

Consider the following R code and its output below:

```lm(data.frame(scale(my_mod\$model))) # Get standardized regression coefficients
# Call:
# lm(formula = data.frame(scale(my_mod\$model)))
#
# Coefficients:
# (Intercept)           x1           x2           x3
#   5.070e-17    1.441e-01    2.200e-01   -9.387e-02```

As you can see, we have returned the beta coefficients corresponding to our linear regression model.

Note that the code of this example was provided by Dr. R. H. Red Owl – thanks a lot to him! Please have a look at his comments below this tutorial to get more info on this code.

## Example 2: Extract Standardized Coefficients from Linear Regression Model Using lm.beta Package

Alternatively to the functions of Base R (as explained in Example 1), we can also use the lm.beta package to get the beta coefficients.

In order to use the functions of the lm.beta package, we first have to install and load lm.beta to R:

```install.packages("lm.beta")         # Install lm.beta package

In the next step, we can apply the lm.beta function to the lm model object that we have created before:

```lm.beta(my_mod)                     # Get standardized regression coefficients
# Call:
# lm(formula = y ~ ., data = data)
#
# Standardized Coefficients::
# (Intercept)          x1          x2          x3
#   0.0000000   0.1441121   0.2199585  -0.0938736```

The previous output shows our standardized regression coefficients.

## Video, Further Resources & Summary

Do you need further info on the R syntax of this article? Then I recommend watching the following video instruction on my YouTube channel. In the video, I’m explaining the R syntax of this article in R:

Additionally, you could have a look at some of the other articles on my website:

Summary: You have learned in this article how to calculate beta coefficients from a linear model in the R programming language. In case you have additional questions, please let me know in the comments section.

Subscribe to the Statistics Globe Newsletter

• Dr. R. H. Red Owl
June 8, 2022 2:24 am

As you probably already know, R stores a pointer to the raw data in the regression model. That allows us to get the beta coefficients with a single line of Base R code after running a linear model.

```# Your data and model
set.seed(2344637)
x1 <- rnorm(100)
x2 <- rnorm(100) + 0.25 * x1
x3 <- rnorm(100) + 0.5 * x1 - 0.3 * x2
y <- rnorm(100) + 0.15 * x1 + 0.4 * x2 - 0.1 * x3
mydata <- data.frame(x1, x2, x3, y)
my_mod <- lm(y ~ ., mydata)

# This produces the beta coefficients using only Base R code
lm(data.frame(scale(my_mod\$model)))```
• June 8, 2022 6:55 am

Hey,

Thanks a lot for this very nice alternative code! I have added another example (see Example 1), which uses your code to produce the beta coefficients.

Regards,
Joachim

• Dr. R. H. Red Owl
June 8, 2022 3:11 am
```# A slightly longer line of Base R code produces both the beta coefficients and the 95% CIs for the betas.

( fit2 <- lm(data.frame(scale(my_mod\$model))) ) ; confint(fit2)```
• June 8, 2022 6:55 am

Thanks again for the additional syntax, this is great! 🙂

Regards,
Joachim

• Christian
August 7, 2022 12:48 pm

Hi, Joachim! What if I want to assign the beta and r value from the exponential regression model to a variable. How do I do it. THanks!

• August 8, 2022 9:12 am

Hey Christian,

I assume this should be possible with the code of the present tutorial. Have you already tried this code?

Regards,
Joachim

• Christian Koko
September 21, 2022 7:29 pm

Hi Joachim,
How can use the estimated coefficient of regression to predict another dataset?
Thanks.

• September 22, 2022 8:12 am

Hi Christian,

Regards,
Joachim

• Christian Koko
September 22, 2022 11:24 am

Hey Joachim,
Thank you very much for your help. I appreciate it.

Christian

• September 29, 2022 4:05 pm

You are very welcome, Christian.

Regards,
Joachim

• Thomas J.
November 12, 2022 12:01 am

Hi Joachim,

first of all many thanks for the great posts here and on Youtube. That always helps me a lot.

I noticed something with my own data set using the solutions shown here. The two examples work fine when the dataset is complete. However, if values are missing by chance, then you get slight differences.

In order to see the effect, one can first randomly remove a few values in e.g. column x1 and x3.
set.seed(16)
data[sample(1:nrow(data), 5), “x1”] <- NA
data[sample(1:nrow(data), 5), "x3"] <- NA

The beta coefficients can also be obtained via the following regression:
my_mod_standardized <- lm(scale(y) ~ scale(x1) + scale(x2) + scale(x3), data = data)
summary(my_mod_standardized)

If you now run the following code, you will notice slight differences:
my_mod <- lm(y ~ ., data)
lm(data.frame(scale(my_mod\$model)))
lm.beta(my_mod)

You can also calculate the beta coefficients manually (see for the formula e.g. https://en.wikipedia.org/wiki/Standardized_coefficient). These correspond to the values of the regression.
my_mod\$coefficients*sd(data\$x1, na.rm = TRUE)/sd(data\$y, na.rm = TRUE) # beta coefficient of x1
my_mod\$coefficients*sd(data\$x2, na.rm = TRUE)/sd(data\$y, na.rm = TRUE) # beta coefficient of x2
my_mod\$coefficients*sd(data\$x3, na.rm = TRUE)/sd(data\$y, na.rm = TRUE) # beta coefficient of x3

Please let me know if I have missed anything.

Best regards,
Thomas

• November 14, 2022 1:14 pm

Hi Thomas,

Thanks a lot for the very kind words regarding my tutorials, glad you find them useful!

Regarding your question, I’ve just executed your code, and I’m also receiving different outputs. To be honest, I don’t know why this is happening. I assume that one of the functions must handle NA values differently than the other, but you would have to dive deeper into the help documentation of the different functions to read more about their handling of missing data.

Please let me know in case you find out why this happens, I’m also curious now.

Regards,
Joachim

• Thomas J.
November 14, 2022 3:43 pm

Hi Joachim,

Now that I have also been gripped by curiosity, I have taken a closer look at the lm.beta() function. For the first possibility “Example 1: Extract Standardized Coefficients from Linear Regression Model Using Base R” I suspect the same reason that I think I have found now:

In your original data set, I had randomly removed 5 values in x1 and 5 values in x3. So there were a total of 10 rows with NA values.
With the lm.beta() function (https://rdrr.io/cran/lm.beta/src/R/lm.beta.R), all rows that contain an NA value in any variable are completely removed. This means that a data set with 90 rows is used for all calculations here. But this rather means that even for the standardisation of variables that actually had no NA value at all (x2 and y), only 90 observations are used.
In contrast, I calculated the standardisation according to the textbook formula (see e.g. Wooldridge, 2020, Introductory Econometrics, A Modern Approach, p. 184 f.), i.e. the standard deviations of x2 and y were calculated on the basis of the 100 existing observations.
This corresponds to the possibility using the regression method: lm(scale(y) ~ scale(x1) + scale(x2) + scale(x3), data = data).

As a sidenote, I noticed that the lm.beta() function does not standardise via the standard deviation, but via the root of the sum of the squared deviations (https://rdrr.io/cran/lm.beta/f/inst/doc/implementation.pdf). However, this is only generally valid if there are the same number of non-NA values in the respective variables.

I hope it helps you and all users here a little bit.

Best regards,
Thomas

• November 15, 2022 10:25 am