Estimate Linear Model with Fixed Intercept in R (2 Examples)

 

In this post you’ll learn how to set a fixed intercept when estimating a linear regression model in the R programming language.

The post is structured as follows:

It’s time to dive into the programming part.

 

Creation of Example Data

The data below will be used as basement for this R tutorial:

set.seed(653897)                                   # Create example data
x <- rnorm(1000, 3)
y <- rnorm(1000, 2) + x

Our example data consists of two randomly distributed numeric vectors that are correlated with each other.

Let’s estimate a linear regression model without specifying the intercept manually (i.e. the default specification of the lm function):

mod_default <- lm(y ~ x)                           # Estimate linear model
summary(mod_default)                               # Summary statistics
# Call:
# lm(formula = y ~ x)
# 
# Residuals:
#     Min      1Q  Median      3Q     Max 
# -3.3152 -0.6598  0.0209  0.6563  3.4294 
# 
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  2.05729    0.09966   20.64   <2e-16 ***
# x            0.98086    0.03156   31.08   <2e-16 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 0.9891 on 998 degrees of freedom
# Multiple R-squared:  0.4919,	Adjusted R-squared:  0.4914 
# F-statistic: 966.1 on 1 and 998 DF,  p-value: < 2.2e-16

The previous output of the RStudio console shows the descriptive summary statistics of our linear regression model. As you can see, we have estimated an intercept of 2.05729 and a regression coefficient for x of 0.98086.

Let’s estimate another model with fixed intercept…

 

Example 1: Estimate Linear Model with Fixed Intercept Using I() Function

Example 1 illustrates how to estimate a generalized linear model with known intercept.

For this, we first have to specify our fixed intercept:

intercept <- 3                                     # Define fixed intercept

Next, we can estimate our linear model using the I() function as shown below:

mod_intercept_1 <- lm(I(y - intercept) ~ 0 + x)    # Model with fixed intercept

Finally, we can apply the summary function to return our descriptive statistics:

summary(mod_intercept_1)                           # Summary statistics
# Call:
# lm(formula = I(y - intercept) ~ 0 + x)
# 
# Residuals:
#     Min      1Q  Median      3Q     Max 
# -3.0314 -0.7734 -0.0577  0.6222  3.1767 
# 
# Coefficients:
#   Estimate Std. Error t value Pr(>|t|)    
# x  0.69743    0.01033   67.49   <2e-16 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 1.032 on 999 degrees of freedom
# Multiple R-squared:  0.8201,	Adjusted R-squared:   0.82 
# F-statistic:  4555 on 1 and 999 DF,  p-value: < 2.2e-16

As you can see, the previously estimated model did not return an intercept value, since we have manually specified this intercept in the forefront.

You can also see that the x estimate has changed to 0.69743.

 

Important notes on models with fixed intercept:

The summary output of models with fixed intercept has to be interpreted carefully. Metrics such as the R-squared, the t-value, and the F-statistic are much larger than in the model without fixed intercept.

Furthermore, it is often not advisable to specify a fixed intercept from a theoretical & methodological viewpoint. You may find a detailed discussion on this topic in this thread on Cross Validated.

 

Example 2: Estimate Linear Model with Fixed Intercept Using offset() & rep() Functions

This example shows a second alternative to the syntax of the previous example.

In this example we’ll use the offset and rep functions to estimate our linear model with known intercept:

mod_intercept_2 <- lm(y ~ x + 0 +                  # Model with fixed intercept
                        offset(rep(intercept, 1000)))

The following summary statistics are exactly the same as in Example 1, even though we have used a different R syntax:

summary(mod_intercept_2)                           # Summary statistics
# Call:
# lm(formula = y ~ x + 0 + offset(rep(intercept, 1000)))
# 
# Residuals:
#     Min      1Q  Median      3Q     Max 
# -3.0314 -0.7734 -0.0577  0.6222  3.1767 
# 
# Coefficients:
#   Estimate Std. Error t value Pr(>|t|)    
# x  0.69743    0.01033   67.49   <2e-16 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 1.032 on 999 degrees of freedom
# Multiple R-squared:  0.9613,	Adjusted R-squared:  0.9612 
# F-statistic: 2.479e+04 on 1 and 999 DF,  p-value: < 2.2e-16

 

Video, Further Resources & Summary

In case you need further explanations on the content of this article, you may have a look at the following video on my YouTube channel. I’m illustrating the R codes of this article in the video:

 

 

In addition, you could read the other tutorials of my website.

 

In summary: At this point you should know how to define a known constant in a linear regression model in R programming. Let me know in the comments section, if you have further questions or comments on regression models, constants, or any other related topics.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


5 Comments. Leave new

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top