Only Import Selected Columns of Data in R (2 Examples)
In this tutorial you’ll learn how to read only certain variables of a data frame in R.
The content of the post is structured as follows:
Let’s do this.
Creation of Example Data
As a first step, we’ll have to create some data that we can use in the example code later on:
data <- data.frame(x1 = 8:4, # Create example data x2 = letters[3:7], x3 = 55, x4 = 2:6, x5 = "XX") data # Print example data # x1 x2 x3 x4 x5 # 1 8 c 55 2 XX # 2 7 d 55 3 XX # 3 6 e 55 4 XX # 4 5 f 55 5 XX # 5 4 g 55 6 XX
The previous output of the RStudio console shows that our example data has five rows and five columns.
Next, let’s export this data frame to a working directory on our computer:
write.csv(data, # Write data to directory "C:/Users/Joach/Desktop/My Folder/data.csv", row.names = FALSE)
After running the previous R code, you should find a CSV-file in the specified folder.
The following examples explain how to import only a subset of these data to R.
Example 1: Only Import Selected Variables Using read.table() Function
The following R syntax explains how to import a selected set of columns using the read.table function provided by the basic installation of the R programming language.
Have a look at the following R code and its output:
data_import1 <- read.table("C:/Users/Joach/Desktop/My Folder/data.csv", # Import columns header = TRUE, sep = ",", colClasses = c("numeric", "factor", "NULL", "numeric", "NULL")) data_import1 # Print imported data # x1 x2 x4 # 1 8 c 2 # 2 7 d 3 # 3 6 e 4 # 4 5 f 5 # 5 4 g 6
The previous R syntax imported only three columns of our data and stored this data frame subset in the new data frame object called data_import1.
The selected variables were specified by using the colClasses argument of the read.table function. We specified valid classes for those variables we wanted to import and “NULL” for those variables we did not want to import.
Example 2: Only Import Selected Variables Using fread() Function of data.table Package
A convenient alternative to the R code shown in Example 1 is provided by the data.table package. First, we have to install and load the data.table package.
install.packages("data.table") # Install data.table library("data.table") # Load data.table package
Now, we can apply the fread command of the data.table package to read only selected variables by specifying the column names of the variables we want to import (i.e. select = c(“x1”, “x2”, “x4”)).
data_import2 <- fread("C:/Users/Joach/Desktop/My Folder/data.csv", # Import columns select = c("x1", "x2", "x4")) data_import2 # Print imported data # x1 x2 x4 # 1: 8 c 2 # 2: 7 d 3 # 3: 6 e 4 # 4: 5 f 5 # 5: 4 g 6
The values shown in the previous output are the same as in Example 1. However, please note that the data.table package created data.tables instead of data.frames. In case you prefer to work with data.frames, you may convert the data.table by using the as.data.frame function.
Video, Further Resources & Summary
Have a look at the following video of my YouTube channel. In the video, I illustrate the examples of this article in RStudio:
The YouTube video will be added soon.
In addition, you might read the related articles on this website.
- Unique Rows of Data Frame Based On Selected Columns
- Use apply Function Only for Specific Data Frame Columns
- Select Only Numeric Columns from Data Frame in R
- All R Programming Examples
You learned in this tutorial how to import a selected set of columns when reading the data in the R programming language. Let me know in the comments section, if you have any additional questions or comments.