# Specify Reference Factor Level in Linear Regression in R (Example)

In this article, I’ll explain how to **change the reference category of a factor variable in a linear regression** in R.

The content of the page looks like this:

- Creation of Example Data
- Example: Changing Factor Level Reference with relevel Function
- Video & Further Resources

If you want to learn more about these content blocks, keep reading.

## Creation of Example Data

First, we need to create some example data that we can use in our linear regression:

set.seed(2580) # Create random example data N <- 1000 x <- sample(1:5, N, replace = TRUE) y <- round(x + rnorm(N), 2) x <- as.factor(x) data <- data.frame(x, y) head(data) # x y # 1 1 -1.93 # 2 3 3.15 # 3 2 0.14 # 4 1 1.33 # 5 1 0.11 # 6 3 3.60 |

set.seed(2580) # Create random example data N <- 1000 x <- sample(1:5, N, replace = TRUE) y <- round(x + rnorm(N), 2) x <- as.factor(x) data <- data.frame(x, y) head(data) # x y # 1 1 -1.93 # 2 3 3.15 # 3 2 0.14 # 4 1 1.33 # 5 1 0.11 # 6 3 3.60

As you can see based on the previous output of the RStudio console, our data consists of the two columns x and y, whereby each variable contains 1000 values. The variable x is a factor variable with five levels (i.e. 1, 2, 3, 4, and 5) and the variable y is our numeric outcome variable.

Now, we can apply a linear regression to our data:

summary(lm(y ~ x, data)) # Linear regression (default) |

summary(lm(y ~ x, data)) # Linear regression (default)

**Table 1: Regular Output of Linear Regression in R.**

Table 1 shows the summary output of our regression. As indicated by the red arrow, the reference category 1 was used for our factor variable x (i.e. the factor level 1 is missing in the regression output).

In the following example, I’ll show how to specify this reference category manually. So keep on reading!

## Example: Changing Factor Level Reference with relevel Function

If we want to change the reference category of a factor vector, we can apply the relevel function. Within the relevel function, we have to specify the ref argument to be equal to our desired reference category:

data$x <- relevel(data$x, ref = 2) # Apply relevel function |

data$x <- relevel(data$x, ref = 2) # Apply relevel function

Now, let’s apply exactly the same linear regression R code as before:

summary(lm(y ~ x, data)) # Linear regression (relevel) |

summary(lm(y ~ x, data)) # Linear regression (relevel)

**Table 2: Linear Regression Output with Modified Reference Category of Factor Variable.**

Table 2 illustrates our summary statistics. As you can see, this time the reference category 2 was used (i.e. the factor level 2 is missing in the regression output). Looks good!

## Video & Further Resources

Have a look at the following video that I have published on my YouTube channel. In the video, I explain the content of this post:

In addition, you may want to read the other tutorials of this website:

- Reorder Levels of Factor without Changing Order of Values
- Convert Factor to Dummy Indicator Variables for Every Level
- The R Programming Language

Summary: In this article you learned how to **force R to use a particular factor level as reference group** in the R programming language. In case you have any additional comments and/or questions, let me know in the comments section.

**5**/

**5**(

**1**vote )

### Subscribe to my free statistics newsletter: