Estimate Linear Model with Fixed Intercept in R (2 Examples)
In this post you’ll learn how to set a fixed intercept when estimating a linear regression model in the R programming language.
The post is structured as follows:
It’s time to dive into the programming part.
Creation of Example Data
The data below will be used as basement for this R tutorial:
set.seed(653897) # Create example data x <- rnorm(1000, 3) y <- rnorm(1000, 2) + x
Our example data consists of two randomly distributed numeric vectors that are correlated with each other.
Let’s estimate a linear regression model without specifying the intercept manually (i.e. the default specification of the lm function):
mod_default <- lm(y ~ x) # Estimate linear model summary(mod_default) # Summary statistics # Call: # lm(formula = y ~ x) # # Residuals: # Min 1Q Median 3Q Max # -3.3152 -0.6598 0.0209 0.6563 3.4294 # # Coefficients: # Estimate Std. Error t value Pr(>|t|) # (Intercept) 2.05729 0.09966 20.64 <2e-16 *** # x 0.98086 0.03156 31.08 <2e-16 *** # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # # Residual standard error: 0.9891 on 998 degrees of freedom # Multiple R-squared: 0.4919, Adjusted R-squared: 0.4914 # F-statistic: 966.1 on 1 and 998 DF, p-value: < 2.2e-16
The previous output of the RStudio console shows the descriptive summary statistics of our linear regression model. As you can see, we have estimated an intercept of 2.05729 and a regression coefficient for x of 0.98086.
Let’s estimate another model with fixed intercept…
Example 1: Estimate Linear Model with Fixed Intercept Using I() Function
Example 1 illustrates how to estimate a generalized linear model with known intercept.
For this, we first have to specify our fixed intercept:
intercept <- 3 # Define fixed intercept
Next, we can estimate our linear model using the I() function as shown below:
mod_intercept_1 <- lm(I(y - intercept) ~ 0 + x) # Model with fixed intercept
Finally, we can apply the summary function to return our descriptive statistics:
summary(mod_intercept_1) # Summary statistics # Call: # lm(formula = I(y - intercept) ~ 0 + x) # # Residuals: # Min 1Q Median 3Q Max # -3.0314 -0.7734 -0.0577 0.6222 3.1767 # # Coefficients: # Estimate Std. Error t value Pr(>|t|) # x 0.69743 0.01033 67.49 <2e-16 *** # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # # Residual standard error: 1.032 on 999 degrees of freedom # Multiple R-squared: 0.8201, Adjusted R-squared: 0.82 # F-statistic: 4555 on 1 and 999 DF, p-value: < 2.2e-16
As you can see, the previously estimated model did not return an intercept value, since we have manually specified this intercept in the forefront.
You can also see that the x estimate has changed to 0.69743.
Important notes on models with fixed intercept:
The summary output of models with fixed intercept has to be interpreted carefully. Metrics such as the R-squared, the t-value, and the F-statistic are much larger than in the model without fixed intercept.
Furthermore, it is often not advisable to specify a fixed intercept from a theoretical & methodological viewpoint. You may find a detailed discussion on this topic in this thread on Cross Validated.
Example 2: Estimate Linear Model with Fixed Intercept Using offset() & rep() Functions
This example shows a second alternative to the syntax of the previous example.
In this example we’ll use the offset and rep functions to estimate our linear model with known intercept:
mod_intercept_2 <- lm(y ~ x + 0 + # Model with fixed intercept offset(rep(intercept, 1000)))
The following summary statistics are exactly the same as in Example 1, even though we have used a different R syntax:
summary(mod_intercept_2) # Summary statistics # Call: # lm(formula = y ~ x + 0 + offset(rep(intercept, 1000))) # # Residuals: # Min 1Q Median 3Q Max # -3.0314 -0.7734 -0.0577 0.6222 3.1767 # # Coefficients: # Estimate Std. Error t value Pr(>|t|) # x 0.69743 0.01033 67.49 <2e-16 *** # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # # Residual standard error: 1.032 on 999 degrees of freedom # Multiple R-squared: 0.9613, Adjusted R-squared: 0.9612 # F-statistic: 2.479e+04 on 1 and 999 DF, p-value: < 2.2e-16
Video, Further Resources & Summary
In case you need further explanations on the content of this article, you may have a look at the following video on my YouTube channel. I’m illustrating the R codes of this article in the video:
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
In addition, you could read the other tutorials of my website.
- Remove Intercept from Regression Model
- Extract Regression Coefficients of Linear Model in R
- Extract Significance Stars & Levels from Linear Regression Model
- Extract Multiple & Adjusted R-Squared from Linear Regression Model
- Extract Standard Error, t-Value & p-Value from Linear Regression Model
- How to Extract the Intercept from a Linear Regression Model
- All R Programming Examples
In summary: At this point you should know how to define a known constant in a linear regression model in R programming. Let me know in the comments section, if you have further questions or comments on regression models, constants, or any other related topics.
5 Comments. Leave new
Leave a Reply Cancel reply
Statistics Globe Newsletter
there is a differenz in R-Square and F-Statistik between the two methods, but why?
Indeed, that’s an interesting question! My guess would be that the fixed intercept has an impact on these two metrics, but to be honest, I’m not 100% sure about this.
Please let me know in case you find a good explanation, I’m also curious now 🙂
It is because different null models for the F-test. 🙂
Ah ok, thanks for sharing! 🙂
Since this discussion popped up once again in the comments of the YouTube video, I have done some further research and found this thread at Cross Validated. It explains this topic quite nicely.
I hope this helps!