Use of Tilde ~ in R (Example)
In this R tutorial you’ll learn how to use the tilde symbol (i.e. ~).
Table of contents:
Tilde Symbol Explained
Generally speaking, the tilde symbol is used within formulas of statistical models.
The left side of the tilde symbol specifies the target variable (also called dependent variable or outcome) and the right side of the tilde specifies the predictor variable(s) (also called independent variables).
Let’s move on to some R codes in action. This will make the use of the tilde symbol much easier to understand!
Introducing Example Data
As a first step, let’s create some example data:
set.seed(9782356) # Create example data x1 <- rnorm(100) x2 <- rnorm(100) + x1 x3 <- rnorm(100) + x1 + x2 y <- rnorm(100) + x1 + x2 + x3 data <- data.frame(x1, x2, x3, y) head(data) # Head of example data # x1 x2 x3 y # 1 2.3390681 1.5383124 4.44456236 9.9461754 # 2 0.4751355 -0.4914280 0.68240263 1.0319888 # 3 0.2193361 -0.6160729 -0.04378242 -0.9127214 # 4 0.3338190 -0.4646736 -1.14482831 -1.8428651 # 5 -0.8114951 -0.8709525 -0.83104367 -2.7991183 # 6 -0.4757980 0.3989116 0.19974978 -0.8166479
The previous RStudio console output illustrates the structure of our example data: It consists of four numeric columns. The variables x1, x2, and x3 will be used as predictors for the target variable y.
Example: Using ~ within lm() Function to Estimate Linear Regression Model
This example illustrates how to use the tilde symbol within the lm function to fit a linear regression model in R.
Have a look at the following R code:
my_model <- lm(y ~ x1 + x2 + x3, data) # Estimate linear model
Within the lm function, we have specified our formula (i.e. y ~ x1 + x2 + x3). The ~ symbol defines the predictors and the target variable.
We can now use the summary() function to produce descriptive statistics of our previously estimated linear regression model:
summary(my_model) # Summary statistics of model # Call: # lm(formula = y ~ x1 + x2 + x3, data = data) # # Residuals: # Min 1Q Median 3Q Max # -2.05220 -0.65160 -0.08235 0.48653 2.58307 # # Coefficients: # Estimate Std. Error t value Pr(>|t|) # (Intercept) 0.000633 0.098582 0.006 0.995 # x1 1.261750 0.181533 6.951 4.38e-10 *** # x2 1.023750 0.165229 6.196 1.44e-08 *** # x3 0.880829 0.116128 7.585 2.12e-11 *** # --- # Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # # Residual standard error: 0.9621 on 96 degrees of freedom # Multiple R-squared: 0.9555, Adjusted R-squared: 0.9541 # F-statistic: 687 on 3 and 96 DF, p-value: < 2.2e-16
Video & Further Resources
Do you want to learn more about programming in R? Then you might watch the following video of my YouTube channel. I illustrate the R codes of this article in the video instruction.
The YouTube video will be added soon.
Besides the video, I can recommend to have a look at the related articles of https://statisticsglobe.com/:
You learned in this tutorial how to apply the tilde symbol in R programming. Tell me about it in the comments, in case you have any further questions.
Statistics Globe Newsletter