# Specify Reference Factor Level in Linear Regression in R (Example)

In this article, I’ll explain how to change the reference category of a factor variable in a linear regression in R.

The content of the page looks like this:

## Creation of Example Data

First, we need to create some example data that we can use in our linear regression:

```set.seed(2580) # Create random example data N <- 1000 x <- sample(1:5, N, replace = TRUE) y <- round(x + rnorm(N), 2) x <- as.factor(x) data <- data.frame(x, y) head(data) # x y # 1 1 -1.93 # 2 3 3.15 # 3 2 0.14 # 4 1 1.33 # 5 1 0.11 # 6 3 3.60```

As you can see based on the previous output of the RStudio console, our data consists of the two columns x and y, whereby each variable contains 1000 values. The variable x is a factor variable with five levels (i.e. 1, 2, 3, 4, and 5) and the variable y is our numeric outcome variable.

Now, we can apply a linear regression to our data:

`summary(lm(y ~ x, data)) # Linear regression (default)` Table 1: Regular Output of Linear Regression in R.

Table 1 shows the summary output of our regression. As indicated by the red arrow, the reference category 1 was used for our factor variable x (i.e. the factor level 1 is missing in the regression output).

In the following example, I’ll show how to specify this reference category manually. So keep on reading!

## Example: Changing Factor Level Reference with relevel Function

If we want to change the reference category of a factor vector, we can apply the relevel function. Within the relevel function, we have to specify the ref argument to be equal to our desired reference category:

`data\$x <- relevel(data\$x, ref = 2) # Apply relevel function`

Now, let’s apply exactly the same linear regression R code as before:

`summary(lm(y ~ x, data)) # Linear regression (relevel)` Table 2: Linear Regression Output with Modified Reference Category of Factor Variable.

Table 2 illustrates our summary statistics. As you can see, this time the reference category 2 was used (i.e. the factor level 2 is missing in the regression output). Looks good!

## Video & Further Resources

Have a look at the following video that I have published on my YouTube channel. In the video, I explain the content of this post:

In addition, you may want to read the other tutorials of this website:

Summary: In this article you learned how to force R to use a particular factor level as reference group in the R programming language. In case you have any additional comments and/or questions, let me know in the comments section.