Conditionally Remove Row from Data Frame in R (3 Examples) | How to Delete Rows

 

This page explains how to conditionally delete rows from a data frame in R programming.

The article will consist of this:

Let’s do this.

 

Creation of Example Data

In the examples of this R programming tutorial, we’ll use the following data frame as basement:

data <- data.frame(x1 = 1:5,                  # Create example data
                   x2 = letters[1:5],
                   x3 = "x")
data                                          # Print example data
#   x1 x2 x3
# 1  1  a  x
# 2  2  b  x
# 3  3  c  x
# 4  4  d  x
# 5  5  e  x

Our example data contains five rows and three columns.

 

Example 1: Remove Row Based on Single Condition

If we want to delete one or multiple rows conditionally, we can use the following R code:

data[data$x1 != 2, ]                          # Remove row based on condition
#   x1 x2 x3
# 1  1  a  x
# 3  3  c  x
# 4  4  d  x
# 5  5  e  x

The previous R syntax removed each row from our data frame, which fulfilled the condition data$x1 != 2 (i.e. the second row).

In this example, we used only one logical condition. However, we can also remove rows according to multiple conditions and that’s what I’m going to show you next!

 

Example 2: Remove Row Based on Multiple Conditions

We can remove rows based on multiple conditions by using the &- or the |-operator. Have a look at the following R code:

data[data$x1 != 2 & data$x2 != "e", ]         # Multiple conditions
#   x1 x2 x3
# 1  1  a  x
# 3  3  c  x
# 4  4  d  x

As you can see based on the output of the RStudio console, the previous R syntax deleted two rows according to the two logical conditions data$x1 != 2 & data$x2 != “e”.

 

Example 3: Remove Row with subset function

Alternatively to Examples 1 and 2, we can use the subset function:

subset(data, data$x1 != 2 & data$x2 != "e")   # Apply subset function
#   x1 x2 x3
# 1  1  a  x
# 3  3  c  x
# 4  4  d  x

The resulting output is the same as in Example 2, since we used the same condition. However, this time we used the subset command instead of square brackets. Which of these options you prefer, is a matter of taste!

 

Video & Further Resources

Do you need more info on the content of this tutorial? Then you may want to have a look at the following video of my YouTube channel. I’m illustrating the R codes of this tutorial in the video:

 

 

In addition, you may have a look at the related articles of my homepage. A selection of related articles is shown below:

 

To summarize: In this tutorial you learned how to exclude specific rows from a data table or matrix in the R programming language. Please let me know in the comments, in case you have further questions.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


8 Comments. Leave new

  • best website ever for learning R

    Reply
  • Hi Mathias,

    This is very helpful, thank you. I have a slightly different data layout and was wondering if you have any input. I have 5 rows (representing 5 conditions) for each participant and I want to remove all of them based on performance in one of these conditions. So I would like to say something like “if accuracy is smaller than 80% in condition x, remove all rows for this participant”. Is there a way of doing this? Thank you in advance.

    Reply
    • Hello Danai,

      You can use the following:

      data<-data.frame(
        A = c(5, 2, 3, 2, 4),
        B = c(4, 5, 6, 5, 6),
        C = c(7, 8, 9, 10, 12)
      )
      data
       
      row.names(data)<-c("cond1", "cond2", "cond3", "cond4", "cond5")
      data
      #       A B  C
      # cond1 5 4  7
      # cond2 2 5  8
      # cond3 3 6  9
      # cond4 2 5 10
      # cond5 4 6 12
       
      data_filt1 <- data[, data[1, ] != 5]
      data_filt1
      #       B  C
      # cond1 4  7
      # cond2 5  8
      # cond3 6  9
      # cond4 5 10
      # cond5 6 12

      However, in my opinion, the conventional layout is more intuitive and useful for model building and using the functions of dplyr package. You can simply transpose your data and employ the method shown in this tutorial. See the example below.

      data_t<-as.data.frame(t(data))
      data_t
      #   cond1 cond2 cond3 cond4 cond5
      # A     5     2     3     2     4
      # B     4     5     6     5     6
      # C     7     8     9    10    12

      Regards,
      Cansu

      Reply
  • Hi everyone,

    I am new to R and am using it as part of a Master’s Degree.

    I have a data frame that loaded from the World Bank.
    https://datacatalog.worldbank.org/search/dataset/0037654/Gender-Statistics
    “`Gender_StatsData % filter(“TER” %in% Indicator.Code))“`
    complained that Indicator.Code is not found.

    Changing it to:
    “`View(Gender_StatsData %>% filter(“TER” %in% Gender_StatsData$Indictor.Code))“`
    Doesn’t throw an exception : But all I got was a “`0“` or “`NA“` in every cell.

    “`View(subset(Gender_StatsData, “TER” %in% Indicator.Code))“`
    Returns 0 rows – and keeps the column headers.
    And Including the dataframe scope / namespace : makes no difference with 0 results.
    “`View(subset(Gender_StatsData, “TER” %in% Gender_StatsData$Indicator.Code))“`

    There is obvioously somethng I am missing that I just can’t see.
    Can anyone please point me in the right direction for getting the same results as:
    “`View(Gender_StatsData[grep(“TER”, Gender_StatsData$Indicator.Code),])“`

    But using “`subset / filter“`.

    Any assistance – would be really appreciated.
    Thanks!

    Reply
    • Hello Gavin,

      Please try these code scripts:

      filtered_data <- Gender_StatsData %>% filter(str_detect(Indicator.Code, "TER"))
      print(filtered_data)
      subsetted_data <- subset(Gender_StatsData, grepl("TER", Indicator.Code))
      print(subsetted_data)

      Also, it’s a good idea to check for missing values (NAs) in your “Indicator.Code” column, which can be done by:

      any(is.na(Gender_StatsData$Indicator.Code))

      NA values can potentially cause problems when you’re subsetting or filtering data. When you use grepl() or str_detect(), they will return NA for NA inputs. In turn, this NA can cause issues because functions like filter() and subset() expect a logical (TRUE/FALSE) vector, not a vector that contains NA values. So you can adapt your code as follows.

      # With subset
      subsetted_data <- subset(Gender_StatsData, !is.na(Indicator.Code) & grepl("TER", Indicator.Code))
      print(subsetted_data)
       
      # With filter
      filtered_data <- Gender_StatsData %>% filter(!is.na(Indicator.Code) & str_detect(Indicator.Code, "TER"))
      print(filtered_data)

      Best,
      Cansu

      Reply
  • Thanks for this website its great.
    Suggest you improve the example and data to show how to apply conditions simultaneously using | instead of &

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top