Remove Characters Before or After Point in String in R (Example)

 

This article shows how to delete characters in a character string before or after a point in the R programming language.

The page is structured as follows:

Let’s start right away!

 

Creation of Example Data

The following data will be used as basement for this R tutorial:

x <- "aaaa.bbbbbb"          # Create example data
x                           # Print example data
# [1] "aaaa.bbbbbb"

The previous output of the RStudio console shows the structure of the example data – It’s a single character string containing the letters a and b and a point or dot (i.e. “.”) in the middle.

 

Example 1: Remove Part After . Using gsub() Function and \\

This example explains how to extract only the part of a character string before or after a point.

Let’s first apply the gsub function as we usually would, in case we want to remove the part of a string before or after a pattern:

gsub("..*", "", x)          # Apply gsub without \\
# [1] ""

As you can see, the RStudio console returns an empty character after running the previous R code.

The reason for this is that the symbol . is considered as a special character. For that reason, we have to use a double backslash in front of the point (i.e. \\).

gsub("\\..*", "", x)        # Apply gsub with \\
# [1] "aaaa"

This works as expected!

 

Example 2: Remove Part Before . Using gsub() Function and \\

 

It is also possible to remove all characters in front of a point using the gsub function.

For this task, we can use the R code below:

gsub(".*\\.", "", x)        # Apply gsub with \\
# [1] "bbbbbb"

Looks good!

Video, Further Resources & Summary

In case you need more info on the R programming codes of this article, you may watch the following video which I have published on my YouTube channel. In the video, I’m explaining the R programming code of this article.

 

 

Besides the video, you may want to have a look at the other tutorials of this homepage. I have released numerous tutorials already:

 

Summary: In this tutorial, I have explained how to remove characters before or after points in the R programming language. Let me know in the comments, if you have additional questions.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


8 Comments. Leave new

  • You forgot to tell us how to remove the characters before a point.

    Reply
    • Hey Girsão,

      Thanks a lot for the hint, it seems like I have forgotten to do this when I created the tutorial.

      I have just added another example, which explains how to delete all characters before a point.

      Regards

      Joachim

      Reply
  • what if data is aaa.bbbb.cccc, and want to extract bbbb?

    Reply
    • Hey,

      This is probably not the most efficient solution, but it should work:

      substr("aaa.bbbb.cccc",
             gregexpr("\\.", "aaa.bbbb.cccc")[[1]][1] + 1,
             gregexpr("\\.", "aaa.bbbb.cccc")[[1]][2] - 1)

      Regards,
      Joachim

      Reply
  • hi dear .. your classes are amazing .
    if i have the below data frame ,,, how can i run the same command to remove FY from the Year column ? thank you

    Identifier Company Name Country Year
    ELSE.OQ Electro-Sensors Inc USA FY2021
    ELSE.OQ Electro-Sensors Inc USA FY2020
    ELSE.OQ Electro-Sensors Inc USA FY2019
    ELSE.OQ Electro-Sensors Inc USA FY2018
    ELSE.OQ Electro-Sensors Inc USA FY2017
    ELSE.OQ Electro-Sensors Inc USA FY2016
    ELSE.OQ Electro-Sensors Inc USA FY2015
    ELSE.OQ Electro-Sensors Inc USA FY2014
    ELSE.OQ Electro-Sensors Inc USA FY2013
    ELSE.OQ Electro-Sensors Inc USA FY2012
    ELSE.OQ Electro-Sensors Inc USA FY2011
    ELSE.OQ Electro-Sensors Inc USA FY2010
    ELSE.OQ Electro-Sensors Inc USA FY2009
    ELSE.OQ Electro-Sensors Inc USA FY2008
    ELSE.OQ Electro-Sensors Inc USA FY2007
    ELSE.OQ Electro-Sensors Inc USA FY2006
    ELSE.OQ Electro-Sensors Inc USA FY2005
    ELSE.OQ Electro-Sensors Inc USA FY2004
    ELSE.OQ Electro-Sensors Inc USA FY2003

    Reply
  • Can someone help me I am working with table that has number sepereted by semicolons I want to extract every number after semicolon here is my code below
    P[i,j] <- as.numeric(gsub(";..*","", Mod2Data[i,j]))

    Reply
    • Hello Luzuko,

      It seems you want to extract numbers after semicolons. The code you provided extracts numbers before the first semicolon. Let’s modify your regex pattern to extract numbers after the semicolons.

      Given that you have numbers separated by semicolons, I assume there can be multiple numbers after multiple semicolons. To extract all numbers following semicolons, you’d need a loop or some iterative approach.

      Here’s a simple method using strsplit and lapply:

      Mod2Data <- matrix(c("1;2;3", "4;52;64", "7;8;9"), ncol=3)
       
      # Function to extract numbers after the first semicolon
      extract_numbers <- function(x) {
        # Split string by semicolon
        parts <- unlist(strsplit(x, ";"))
       
        # Remove the first number (before the first semicolon)
        parts[-1]
      }
       
      # Apply the function to the data
      result <- matrix(unlist(lapply(Mod2Data, extract_numbers)), ncol=ncol(Mod2Data))
      result

      I hope this is close to what you want to do.

      Best,
      Cansu

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top