Extract Certain Columns of Data Frame in R (4 Examples)

 

This article explains how to extract specific columns of a data set in the R programming language.

I will show you four programming alternatives for the selection of data frame columns. More precisely, the tutorial will contain the following contents:

Let’s move on to the examples!

 

Creation of Example Data

In the examples of this tutorial, I’m going to use the following data frame:

data <- data.frame(x1 = c(2, 1, 5, 1),   # Create example data
                   x2 = c(7, 1, 1, 5),
                   x3 = c(9, 5, 4, 9),
                   x4 = c(3, 4, 1, 2))
data                                     # Print example data

 

example data frame

Table 1: Example Data Frame.

 

Our example data frame consists of four numeric columns and four rows.

In the following, I’m going to show you how to select certain columns from this data frame. I will show you four different alternatives, which will lead to the same output. It depends on your personal preferences, which of the alternatives suits you best.

 

Example 1: Subsetting Data by Column Name

The most common way to select some columns of a data frame is the specification of a character vector containing the names of the columns to extract. Consider the following R code:

data[ , c("x1", "x3")]                   # Subset by name

 

Subset of original example data frame

Table 2: Subset of Example Data Frame.

 

As you can see based on Table 2, the previous R syntax extracted the columns x1 and x3. The previous R syntax can be explained as follows:

  • First, we need to specify the name of our data set (i.e. data)
  • Then, we need to open some square brackets (i.e. [])
  • Within these brackets, we need to write a comma to reflect the two dimensions of our data. Everything before the comma selects specific rows; Everything behind the comma subsets certain columns.
  • Behind the comma, we specify a vector of character strings. Each element of this vector represents the name of a column of our data frame (i.e. x1 and x3).

That’s basically it. However, depending on your personal preferences and your specific data situation, you might prefer one of the other alternatives. So keep on reading…

 

Example 2: Subsetting Data by Column Position

A similar approach to Example one is the subsetting by the position of the columns. Consider the following R code:

data[ , c(1, 3)]                         # Subset by position

Similar to Example 1, we use square brackets and a vector behind the comma to select certain columns.

However, this time we are using a numeric vector, whereby each element of the vector stands for the position of the column.

The first column of our example data is called x1 and the column at the third position is called x3. For that reason, the previous R syntax would extract the columns x1 and x3 from our data set.

 

Example 3: Subsetting Data with select Argument of subset Function

In Example 3, we will access and extract certain columns with the subset function. Within the subset function, we need to specify the name of our data matrix (i.e. data) and the columns we want to select (i.e. x1 and x3):

subset(data, select = c("x1", "x3"))     # Subset with select argument

The output of the previous R syntax is the same as in Example 1 and 2.

 

Example 4: Subsetting Data with select Function (dplyr Package)

Many people like to use the tidyverse environment instead of base R, when it comes to data manipulation. A very popular package of the tidyverse, which also provides functions for the selection of certain columns, is the dplyr package. We can install and load the package as follows:

install.packages("dplyr")                # Install dplyr R package
library("dplyr")                         # Load dplyr R package

Now, we can use the %>% operator and the select function to subset our data set:

data %>% select(x1, x3)                  # Subset with select function

Again, the same output as in the previous examples. It’s up to you to decide, which option you like the most.

 

Video & Further Resources

There was a lot of content in this tutorial. However, if you need more explanations on the different approaches and functions, you could have a look at the following video of my YouTube channel. In the video, I’m explaining the examples of this tutorial in more detail:

 

Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.

YouTube Content Consent Button Thumbnail

YouTube privacy policy

If you accept this notice, your choice will be saved and the page will refresh.

 

In addition, you could have a look at the other R tutorials of my homepage. You can find some interesting tutorials for the manipulation of data sets in R below:

In this tutorial you have learned how to extract specific columns of a data frame in the R programming language. I have shown in multiple examples how to create subsets of consecutive and non-consecutive variables. If you have comments or questions, please let me know in the comments section below.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


26 Comments. Leave new

  • If my column is in sequence like
    X1 X2 X3 X4 X5

    But i wat to output only X4 and X2 column in sequence of first X4 column and after X2 column.

    How can i do that?

    Reply
  • what if i want to extract selected columns with specific row value?

    Reply
  • Hi, I’m trying to extract columns from multiple datasets so I can sum them but the number of columns in each dataset varies. I’m attempting to use a for loop.

    Here is my attempt:

    for (df in 1:length(locs)){
    newdf 0] #get rid of all columns that have only 0s
    newdfsum <- colSums(newdf[ , 9:length(newdf) ]) #sum everything in column 9 and after
    summarysums[i] <- newdfsum #put new df or list in empty vector
    }

    I can do this for one, but I haven't been able to loop through multiple datasets..
    Thank you!

    Reply
  • how do you filter and pick specific rows to use

    Reply
  • Hello. If I had a dataframe called df (containing 5 columns and 30 rows). What code would I use to subset rows 10 to 20 and columns 1 and 5 using base R?

    Reply
  • Jacque Mason
    June 19, 2021 4:38 am

    Hello Joachim, how can I write a specification to extract dataset from a excel spreadsheet?

    Reply
  • Hello. If I had a dataframe called df (containing 26 columns a-z). What code would I use to subset columns K to Q using R not by column #, but by column name range?

    Reply
    • Hey,

      You could use the which function to identify the locations of these columns:

      data <- as.data.frame(matrix(1:130, ncol = 26))
      colnames(data) <- letters
      data
      #   a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s   t   u   v   w   x   y   z
      # 1 1  6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91  96 101 106 111 116 121 126
      # 2 2  7 12 17 22 27 32 37 42 47 52 57 62 67 72 77 82 87 92  97 102 107 112 117 122 127
      # 3 3  8 13 18 23 28 33 38 43 48 53 58 63 68 73 78 83 88 93  98 103 108 113 118 123 128
      # 4 4  9 14 19 24 29 34 39 44 49 54 59 64 69 74 79 84 89 94  99 104 109 114 119 124 129
      # 5 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130
       
      data_subset <- data[ , which(colnames(data) == "k"):which(colnames(data) == "q")]
      data_subset
      #    k  l  m  n  o  p  q
      # 1 51 56 61 66 71 76 81
      # 2 52 57 62 67 72 77 82
      # 3 53 58 63 68 73 78 83
      # 4 54 59 64 69 74 79 84
      # 5 55 60 65 70 75 80 85

      Regards,
      Joachim

      Reply
  • Hi Thank you very much for your great explanations!
    By the way, I got a trouble with this kind of naming process.
    For example, I have a data file of three column vectors of no variable name.

    How can I give names for each three vectors?

    c(1,2,3,4,5), c(6,7,8,910), c(11,12,13,14,15).
    if this is a data file, how can I give each name for ecah vector?

    Could you kindly let me know the codes?

    Thank you.

    Samuel

    Reply
    • Hey Samuel,

      Thank you for the kind words, glad you found the tutorial helpful!

      You may store all those vectors in a data frame as shown below:

      df <- data.frame(name_1 = c(1,2,3,4,5),
                       name_2 = c(6,7,8,9,10),
                       name_3 = c(11,12,13,14,15))
      df
      #   name_1 name_2 name_3
      # 1      1      6     11
      # 2      2      7     12
      # 3      3      8     13
      # 4      4      9     14
      # 5      5     10     15

      Regards,
      Joachim

      Reply
  • I want to sort rowwise values in specific columns, get top ‘n’ values, and get corresponding column names in new columns.

    The output would look something like this:

    SL SW PL PW Species high1 high2 high3 col1 col2 col3
    dbl>
    1 5.1 3.5 1.4 0.2 setosa 3.5 1.4 0.2 SW PL PW
    2 4.9 3 1.4 0.2 setosa 3 1.4 0.2 SW PL PW
    3 4.7 3.2 1.3 0.2 setosa 3.2 1.3 0.2 SW PL PW

    Tried something like code below, but unable to get column names. Help appreciated.

    iris %>%
    rowwise() %>%
    mutate(rows = list(sort(c( Sepal.Width, Petal.Length, Petal.Width), decreasing = TRUE))) %>%
    mutate(high1 = rows[1], col1 = list(colnames(~.)[~. ==rows[1]]),
    high2 = rows[2], col2 = list(colnames(~.)[~. ==rows[2]]),
    high3 = rows[3], col3 = list(colnames(~.)[~. ==rows[3]])
    ) %>%
    select(-rows)

    Reply
  • Fazhir Kayondo
    March 19, 2022 5:24 pm

    Hi I have a data frame with column names
    A B C D E F G
    2 4 6 ? 5 7 3
    ? 2 3 5 ? 3 4
    2 2 3 4 5 6 5

    How could I select out only the columns with “?” in them? Thanks

    Reply
    • Hey Fazhir,

      Please have a look at the example below:

      data <- data.frame(A = c(2, "?", 2),
                         B = c(4, 2, 2),
                         C = c(6, 3, 3),
                         D = c("?", 5, 4))
      data
      #   A B C D
      # 1 2 4 6 ?
      # 2 ? 2 3 5
      # 3 2 2 3 4
       
      data[ , colSums(data == "?") > 0]
      #   A D
      # 1 2 ?
      # 2 ? 5
      # 3 2 4

      Regards,
      Joachim

      Reply
  • Hi,

    How I am trying to compute SD and VAR for a difference in start and end times, its a time-variable basically, I get a NA when I run hms::as_hms(var(hms::as_hms(samp.trips$ride_length))). how can I get around this?

    Thank you in advance.

    Reply
    • Hey TK,

      Could you share the output when you run the following line of code?

      head(samp.trips$ride_length)

      Regards,
      Joachim

      Reply
  • Hi I have data frame consist of the 15 rows and 39 columns. I wanted to change the name of column name from 4-39 with a year names from 1980 to 2015. I can do it manually but is there any fast ways to do it so that all the colanmes have new name from 1980-2015.

    Reply
    • Hi Aman,

      You may use the following R syntax for this:

      colnames(data)[4:39] <- 1980:2015

      Regards,
      Joachim

      Reply

Leave a Reply

Your email address will not be published.

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top