Remove Data Frame Columns by Name in R (6 Examples)
In this article, I’ll explain how to delete data frame variables by their name in R programming.
The article will contain this content:
Here’s how to do it!
I’ll use the following data frame as basement for this R tutorial:
data <- data.frame(x1 = 1:5, # Create example data x2 = 6:10, x3 = letters[1:5], x4 = letters[6:10]) data # Print example data # x1 x2 x3 x4 # 1 1 6 a f # 2 2 7 b g # 3 3 8 c h # 4 4 9 d i # 5 5 10 e j
The previous output of the RStudio console shows that our example data consists of five rows and four columns. The variables x1 and x2 are numeric and the variables x3 and x4 are factors.
Example 1: Removing Variables Using %in%-operator
In Example 1, I’ll illustrate how to drop certain columns of a data frame using the %in%-operator and the names function.
The following R code checks whether the column names of our data frame (i.e. names(data)) are in a vector of variable names we want to remove (i.e. c(“x1”, “x3”)). The bang in front of the names function tells R to drop variables that are fitting this logical condition.
data1 <- data[ , ! names(data) %in% c("x1", "x3")] # Apply %in%-operator data1 # Print updated data # x2 x4 # 1 6 f # 2 7 g # 3 8 h # 4 9 i # 5 10 j
As you can see based on the previous output of the RStudio console, our updated data frame consists of the two columns x2 and x4. The columns x1 and x3 were removed.
Example 2: Keep Certain Variables Using %in%-operator
The %in%-operator can also be used the other way around! This Example explains how to specify the variables we want to retain in the data.
For this, we have to remove the bang sign in front of the names function. We also have to specify the variable names of the variables we want to keep instead of the variables we want to remove.
data2 <- data[ , names(data) %in% c("x2", "x4")] # Keep certain variables data2 # Print updated data # x2 x4 # 1 6 f # 2 7 g # 3 8 h # 4 9 i # 5 10 j
The output is exactly the same as in Example 1, but this time we specified which columns we want to retain in our data. This can be useful when the number of variables we want to keep is relatively small.
Example 3: Removing Variables Using subset Function
The R programming language provides many alternative ways on how to drop columns from a data frame by name. The following R programming syntax explains how to apply the subset function to delete multiple variables:
data3 <- subset(data, select = - c(x1, x3)) # Apply subset function data3 # Print updated data # x2 x4 # 1 6 f # 2 7 g # 3 8 h # 4 9 i # 5 10 j
The output is the same as in Examples 1 and 2. However, note that the subset function also creates data frames with only one column, whereby the %in%-operator converts data frames with only one column into vectors.
It is important to be aware of that difference in case you are working with data sets containing of only a few variables.
Example 4: Removing Variables Using within Function
Another alternative for the dropping of data frame variables by name is the within function. The following R code shows how to combine the within and rm functions to remove columns:
data4 <- within(data, rm(x1, x3)) # Apply within function data4 # Print updated data # x2 x4 # 1 6 f # 2 7 g # 3 8 h # 4 9 i # 5 10 j
Example 5: Removing Variables Using select Function of dplyr Package
There are several add-on packages that also provide functions for the removal of data frame columns. A very popular package for data manipulation is the dplyr package of the tidyverse.
If we want to use the functions of the dplyr package, we first need to install and load dplyr:
install.packages("dplyr") # Install dplyr package library("dplyr") # Load dplyr
Now, we can use the select function of the dplyr package as shown below:
data5 <- select(data, - c(x1, x3)) # Apply select function data5 # Print updated data # x2 x4 # 1 6 f # 2 7 g # 3 8 h # 4 9 i # 5 10 j
Example 6: Removing Variables Using := Operator of data.table Package
Another powerful package for data handling in R is the data.table package. If we want to use the functions of the data.table package, we first need to install and load data.table:
install.packages("data.table") # Install & load data.table package library("data.table")
Now, we can use the setDT function and the := operator to drop specific variables as follows:
data6 <- data # Replicate data setDT(data6)[ , c("x1", "x3") := NULL] # Using := NULL data6 # Print updated data # x2 x4 # 1: 6 f # 2: 7 g # 3: 8 h # 4: 9 i # 5: 10 j
Note that the previous R code converted our data frame to the data.table data type:
class(data6) # Check class of data # "data.table" "data.frame"
If you prefer to work with data.frames or data.tables is a matter of taste.
Video, Further Resources & Summary
If you need further info on the examples of this tutorial, you might have a look at the following video of my YouTube channel. In the video, I’m explaining the topics of this tutorial in RStudio.
In addition to the video, I can recommend reading the other articles of this website.
- Extract Certain Columns of Data Frame
- Select Only Numeric Columns from Data Frame in R
- Drop Multiple Columns from Data Frame Using dplyr Package
- Remove All-NA Columns from Data Frame
- Introduction to R
In summary: This tutorial explained how to deselect and remove columns of a data frame in the R programming language. If you have further questions, let me know in the comments below. Furthermore, don’t forget to subscribe to my email newsletter for regular updates on the newest tutorials.