Extract Numbers from Character String Vector in R (2 Examples)

 

In this tutorial you’ll learn how to return numeric values from a vector of alphanumeric character strings in the R programming language.

The post looks as follows:

Let’s dive into it:

 

Example Data

As a first step, we have to construct some data that we can use in the example syntax later on:

x <- c("aaa3bbb1x12", "5AGAGAGA", "2A3k4GGG5")           # Create character vector
x                                                        # Print vector to console
# "aaa3bbb1x12" "5AGAGAGA"    "2A3k4GGG5"

The previous output of the RStudio console shows that our exemplifying data is a vector containing different character strings. Each of these strings is a mix of numeric values and alphabetic letters.

 

Example 1: Extract First Number from String Using gsub Function

Example 1 illustrates how to return numeric components of character strings by applying the gsub function. Have a look at the following R code:

as.numeric(gsub(".*?([0-9]+).*", "\\1", x))              # Apply gsub
# 3 5 2

As you can see based on the previous RStudio output, we have extracted a vector of three numeric values. Note that the previous R code only extracted the first numeric element of our character strings. For instance, the numbers 1 and 12 were not returned from the first character string element.

 

Example 2: Extract All Numbers from String Using gregexpr & regmatches Functions

Example 2 explains how to return all numeric components from our character string vector using a combination of the gregexpr and regmatches functions. The following R code stores all numeric components in a list:

x_numbers <- regmatches(x, gregexpr("[[:digit:]]+", x))  # Apply gregexpr & regmatches
x_numbers                                                # Print list with numbers
# [[1]]
# [1] "3"  "1"  "12"
# 
# [[2]]
# [1] "5"
# 
# [[3]]
# [1] "2" "3" "4" "5"

As you can see based on the previous output, each list element contains the numbers contained in one of the vector elements of our input vector. We can convert this list to a vector of numeric values as shown below:

as.numeric(unlist(x_numbers))                            # Convert characters to numeric
# 3  1 12  5  2  3  4  5

Looks good!

 

Video & Further Resources

Have a look at the following video of my YouTube channel. I show how to test for numbers in character strings using the R programming code of this article in the video:

 

 

Furthermore, you may want to read the related articles on this website. You can find a selection of tutorials about the manipulation of character strings here.

 

In this post you learned how to check and select numerics by removing all non-numerics from a character string array in the R programming language. In case you have further questions, don’t hesitate to let me know in the comments section below.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top