Use of Tilde ~ in R (Example)

 

In this R tutorial you’ll learn how to use the tilde symbol (i.e. ~).

Table of contents:

 

Tilde Symbol Explained

Generally speaking, the tilde symbol is used within formulas of statistical models.

The left side of the tilde symbol specifies the target variable (also called dependent variable or outcome) and the right side of the tilde specifies the predictor variable(s) (also called independent variables).

Let’s move on to some R codes in action. This will make the use of the tilde symbol much easier to understand!

 

Introducing Example Data

As a first step, let’s create some example data:

set.seed(9782356)                         # Create example data
x1 <- rnorm(100)
x2 <- rnorm(100) + x1
x3 <- rnorm(100) + x1 + x2
y <- rnorm(100) + x1 + x2 + x3
data <- data.frame(x1, x2, x3, y)
head(data)                                # Head of example data
#           x1         x2          x3          y
# 1  2.3390681  1.5383124  4.44456236  9.9461754
# 2  0.4751355 -0.4914280  0.68240263  1.0319888
# 3  0.2193361 -0.6160729 -0.04378242 -0.9127214
# 4  0.3338190 -0.4646736 -1.14482831 -1.8428651
# 5 -0.8114951 -0.8709525 -0.83104367 -2.7991183
# 6 -0.4757980  0.3989116  0.19974978 -0.8166479

The previous RStudio console output illustrates the structure of our example data: It consists of four numeric columns. The variables x1, x2, and x3 will be used as predictors for the target variable y.

 

Example: Using ~ within lm() Function to Estimate Linear Regression Model

This example illustrates how to use the tilde symbol within the lm function to fit a linear regression model in R.

Have a look at the following R code:

my_model <- lm(y ~ x1 + x2 + x3, data)    # Estimate linear model

Within the lm function, we have specified our formula (i.e. y ~ x1 + x2 + x3). The ~ symbol defines the predictors and the target variable.

We can now use the summary() function to produce descriptive statistics of our previously estimated linear regression model:

summary(my_model)                         # Summary statistics of model
# Call:
# lm(formula = y ~ x1 + x2 + x3, data = data)
# 
# Residuals:
#      Min       1Q   Median       3Q      Max 
# -2.05220 -0.65160 -0.08235  0.48653  2.58307 
# 
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept) 0.000633   0.098582   0.006    0.995    
# x1          1.261750   0.181533   6.951 4.38e-10 ***
# x2          1.023750   0.165229   6.196 1.44e-08 ***
# x3          0.880829   0.116128   7.585 2.12e-11 ***
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# Residual standard error: 0.9621 on 96 degrees of freedom
# Multiple R-squared:  0.9555,	Adjusted R-squared:  0.9541 
# F-statistic:   687 on 3 and 96 DF,  p-value: < 2.2e-16

Looks good!

 

Video & Further Resources

Do you want to learn more about programming in R? Then you might watch the following video of my YouTube channel. I illustrate the R codes of this article in the video instruction.

 

The YouTube video will be added soon.

 

Besides the video, I can recommend to have a look at the related articles of https://statisticsglobe.com/:

 

You learned in this tutorial how to apply the tilde symbol in R programming. Tell me about it in the comments, in case you have any further questions.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top