R Error – Undefined Columns Selected when Subsetting Data Frame

 

In this article, I’ll illustrate how to debug the error message – undefined columns selected in the R programming language.

The content of the tutorial looks as follows:

Let’s jump right to the exemplifying R code.

 

Construction of Example Data

The following data will be used as basement for this R tutorial:

data <- data.frame(x1 = 1:4,    # Create example data
                   x2 = letters[1:4],
                   x3 = 3)
data                            # Print example data
#   x1 x2 x3
# 1  1  a  3
# 2  2  b  3
# 3  3  c  3
# 4  4  d  3

Have a look at the previous output of the RStudio console. It shows that our example data consists of four rows and three columns.

 

Example 1: Replicating Error: Undefined Columns Selected

This Example illustrates how and why we are sometimes getting the error message “undefined columns selected” when we try to extract a data frame subset. Consider the following R code:

data[data$x1 > 2] # Error: undefined columns selected
# Error in `[.data.frame`(data, data$x1 > 2) : undefined columns selected

As you can see, the previous syntax returned the error message “undefined columns selected”. The reason for this is that we didn’t specify whether we want to select certain rows or columns.

In the R programming language, this can be done by specifying a comma within square brackets. More on that in the next example…

 

Example 2: Fixing Error & Properly Subsetting Data Frame

Example 2 explains how to properly extract a data frame subset without getting the error message “undefined columns selected”. Have a look at the following R syntax:

data[data$x1 > 2, ]             # Comma after logical condition
#   x1 x2 x3
# 3  3  c  3
# 4  4  d  3

Works beautifully! The reason is that we have specified a comma after our logical condition. By doing this, the R programming language knows that we are selecting rows. If we would specify our logical condition after the comma, we would take a subset of variables.

 

Video & Further Resources

In case you need more information on the examples of the present tutorial, you may watch the following video of my YouTube channel. In the video, I’m explaining the R programming syntax of this tutorial:

 

 

In addition, I can recommend to have a look at the related posts on this homepage:

 

In summary: In this article you learned how to solve the problem of undefined columns when subsetting data in R. If you have additional questions and/or comments, don’t hesitate to let me know in the comments below. Furthermore, please subscribe to my email newsletter in order to receive updates on the newest articles.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


12 Comments. Leave new

  • I’m new to R and i’m not sure exactly where to place the comma in the code here:

    load(“factors_list.RData”)
    for (i in 1:length(factors[[1]])){
    if(factors[[2]][i]==”TRUE”){
    alspac.table_ACE[,factors[[1]][i]] <- factor(alspac.table_ACE[,factors[[1]][i]])
    }
    }

    Reply
  • Hello Joachim, again thank you so much for the last advice you gave it was very helpful. Today I’ve got another issue and would be glad if you could help me with this.
    I Got a data frame containing 8 variables and would like to extract one of the variables and store result in another data frame to do some basic text analysis by summarising the individual words in the PROD_NAME column.

    glimpse(Transactions$PROD_NAME)
    chr [1:264836] “Natural Chip Compny SeaSalt175g” “CCs Nacho Cheese 175g” “Smiths Crinkle Cut Chips Chicken 170g”

    I tried the following command but I each time have the same error code Can’t subset columns that don’t exist. Here’s the code

    productWords <- data.table(unlist(strsplit(unique(Transactions[, Transactions$PROD_NAME]), "")))

    Can you help please !!

    Reply
    • Hey again 🙂

      I think you are looking for this code snippet:

      data1$new_col <- data2$your_col

      data1 is the name of the data frame to which you want to add a new column; new_col is the name of the new column; data2 is the data frame from which you would like to extract a column; and your_col is the name of the column that you would like to extract.

      Regards,
      Joachim

      Reply
  • First, congratulations on achieving a ‘super useful’ website. I have three more comments.

    Second, please consider editing and using the information below to extend the list of your examples concerning the “Undefined columns” message.

    I just placed this important reminder comment i one of my programs, where I wasted a few hours because I carelessly re-used some code from another program. “When using certain functions to perform tests on a specific element in an array, be sure to respect the originally declared dimensionality of the array. For example, in str_detect(InFile_arr[OBS],”DESC:”), you will get an error message if InFile_arr[OBS] was not originally declared to be a one-dimensional array. I got this error because I forgot to specify the second dimension of an array that was declared to be two-dimensional. So I needed something like str_detect(InFile_arr[OBS,1],”DESC:”).

    I wasted a few hours tracking this error because the content of the error message is misleading, a situation that easily develops when many error conditions appear long after a program was considered functional and has gone into wide use.

    So, here is my third comment. Consider massively expanding (perhaps with volunteer help) your list of error messages and related URLs that discuss solutions so that Statistics Globe would become the home of an encyclopedia of R error messages, where (as much as feasible) each message is followed by links to descriptions of the known conditions that could trigger a specific message.

    Fourth comment: I suggest that to you find a way to make Statistics Globe a ‘go to’ place for people who have solutions to R-related issues that they would like to publicize. Going over the various R communities, I find an unduly heavy focus on inviting people to bring questions that they need to have answered, when in fact there could be a significant volume of people who have solutions that they would like to share.

    Reply
    • Hey Carl,

      First of all, thanks a lot for the kind words and your detailed feedback!

      Let me get back to some of your points:

      2) Thanks for sharing, I’m certain this will be helpful for others.

      3) I assume you have seen this list already? I’m expanding it as much and as quick as I can.

      4) Letting strangers write content is always tricky in terms of the quality of the content. For that reason, I prefer to rely on a team of people that writes only high-quality content. I’m constantly extending this team to produce even more content. You can get more info on all the team members here.

      Thanks again,
      Joachim

      Reply
  • Thanks for your encouraging response, joachim. Yes I did see your list. I made my suggestion because R development and packages (and applications where the interaction of data structure properties and code can be the source of an error) cover such a wide territory that we probably need teams of volunteers (each focused specific application area) to identify error messages and find useful commentary for people doing de-bugging re. a specific message.

    You are certainly on the right track in being careful about the quality of contributions. Do consider, however, inviting people to offer help as guest authors (knowing that their offerings would be reviewed prior to publication) in particular areas where R users need improvements in key aspects of the existing documentation. Since writing good-quality documentation is a rare skill (IMO), I would throw out the invitation in the hope of getting some good writers to come forward and help. Thanks i advance.

    Reply
    • Thank you for another nice comment Carl. Indeed, inviting guest authors can be very valuable (and I’ve done so already in the past, have a look at the Guest Authors here). It also takes a lot of time to find good guest authors, though, and often it is less time-consuming to write an article by myself instead of investing time to find a good guest author on a certain topic. However, I will definitely continue to feature guest authors on Statistics Globe, since I think it’s great to have more people involved in the platform. 🙂

      Thanks again for your feedback!

      Joachim

      Reply
  • Sorry I missed seeing your list of guest authors. Please add a general invitation to new authors, including the specification of what are your requirements concerning submissions. It will help to identify content areas where you think contributions would be particularly helpful.

    I assume that you will have tolerable problems with the process of adjudicating anything that comes in.

    One step that would help everyone conserve time is that a would be-submitter would provide first an abstract and an outline in the form of main headings. The good and experienced writers will understand this procedure immediately, and the others will be deterred from bothering you.

    I do understand your temptation to focus on contributions by you and others whose talents you know well; but I would still use the general invitation route (perhaps with an incentive of some sort). Why might this be helpful, even though there may be long waiting times for good inputs to arrive?

    First, do please consider how extremely varied are the alternative classes of relevant topics. Also, since good technical-article writing ability is a scarce talent, I would simply leave your general invitation floating out there in the marketplace in the hope that a ‘big fish’ will turn up from time to time.

    In any event, I applaud your decision to maintain high standards in documentation writing. Since I am anxious to support the worldwide volunteered contributions towards development and applications of R, I look forward to seeing your invitation to guest authors and considering your requirements. I am a very experienced writer of technical stuff.

    Reply
    • Thank you for your further thoughts Carl.

      It’s interesting to hear that you are a writer of technical stuff yourself. Would you be interested to write a guest article on Statistics Globe?

      Regards,
      Joachim

      Reply
  • Harish Sudarsanam
    November 15, 2022 3:09 pm

    hey, I have a very unusual error, “Error in data. frame: undefined columns selected”. I get this error when I try to subset a large number of columns like nearly 100 columns, but if I reduce the columns to subset then I don’t get the error. I have read multiple websites but I don’t understand the reason.
    error when I run this; IG_like <- df[, c("ADAMTSL3", "ADGRA2", "ADGRF5", "CD84", "CD96", "CD160", "CD200", "CD244", "CILP", "CILP2", "CNTFR", "CSF1R", "FCMR", "FCAR", "FCER1A", "FCGR1A", "FCGR1BP", "FCGR1CP", "FCGR2A", "FCGR2B", "FCGR2C", "FCGR3A", "FCGR3B", "FCRLA", "FCRLB", "FCRL1", "FCRL2", "FCRL3", "FCRL4", "FCRL5", "FCRL6", "FLT3", "GPA33", "GP6", "ICAM1", "ICAM2", "ICAM3", "ICAM4", "ICAM5", "IGSF1", "IGSF23", "IL1RAP", "IL1RAPL2", "IL1RL2", "IL1R2", "IL6R", "IL11RA", "IL12B", "IL18BP", "IL18RAP", "IL18R1", "ISLR2", "KIT", "LAG3", "LAIR1", "LAIR2", "LEPR", "LSR", "LY9", "MADCAM1", "MALT1", "MILR1", "MMP23A", "MMP23B", "MUSK", "NCR1", "NTRK1", "OSCAR", "PDCD1LG2", "PECAM1", "PTPRK", "PTPRT", "SEMA3A", "SEMA3B", "SEMA3E", "SEMA3F", "SEMA3G", "SEMA4C", "SEMA4D", "SEMA4G", "SEMA7A", "SIGIRR", "SLAMF1", "SLAMF8", "TARM1", "TEK", "THY1", "TIE1", "TMEM81", "TMIGD2", "VSIG10L", "VSTM1", "ZPBP", "ZPBP2")]
    Error in `[.data.frame`(df, , c("ADAMTSL3", "ADGRA2", "ADGRF5", "CD84", :
    undefined columns selected

    but no error when I run this; IG_like <- df[, c("ADAMTSL3", "ADGRA2", "ADGRF5", "CD84")]

    Reply
    • Hi Harish,

      The reason for this might be that you are trying to select columns that do not exist in your data frame. Are there maybe any typos in your column names?

      Please have a look at the following example code. It reproduces your situation and shows how to identify the problematic column names:

      data <- data.frame(x1 = 1:5,    # Create example data
                         x2 = 5:1,
                         x3 = 5)
      data
      #   x1 x2 x3
      # 1  1  5  5
      # 2  2  4  5
      # 3  3  3  5
      # 4  4  2  5
      # 5  5  1  5
       
      data[ , c("x2", "x3", "x4")]    # Try to select column that doesn't exist
      # Error in `[.data.frame`(data, , c("x2", "x3", "x4")) : 
      #   undefined columns selected
       
      c("x2", "x3", "x4")[! c("x2", "x3", "x4") %in% colnames(data)] # Identify non-existent column names
      # [1] "x4"

      I hope that helps!

      Joachim

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top