Convert Data Frame Column to Numeric in R (2 Examples) | Change Factor, Character & Integer

 

In this R tutorial, I’ll explain how to convert a data frame column to numeric in R. No matter if you need to change the class of factors, characters, or integers, this tutorial will show you how to do it.

The article is structured as follows:

Let’s dive right in!

 

Create Example Data

First we need to create some data in R that we can use in the examples later on:

data <- data.frame(x1 = c(1, 5, 8, 2),       # Create example data frame
                   x2 = c(3, 2, 5, 2),
                   x3 = c(2, 7, 1, 2))
data$x1 <- as.factor(data$x1)                # First column is a factor
data$x2 <- as.character(data$x2)             # Second column is a character
data$x3 <- as.integer(data$x3)               # Third column is an integer
data                                         # Print data to RStudio console

You can see the structure of our example data frame in Table 1. The data contains three columns: a factor variable, a character variable, and an integer variable.

 

Example Data with Factor, Character and Integer Data Type

Table 1: Example Data Frame with Factor, Character & Integer Variables.

 

We can check the class of each column of our data table with the sapply function:

sapply(data, class)                          # Get classes of all columns
#       x1          x2          x3 
# "factor" "character"   "integer"

The data is set up, so let’s move on to the examples…

 

Example 1: Convert One Variable of Data Frame to Numeric

In the first example I’m going to convert only one variable to numeric. For this task, we can use the following R code:

data$x1 <- as.numeric(as.character(data$x1))  # Convert one variable to numeric

Note: The previous code converts our factor variable to character first and then it converts the character to numeric. This is important in order to retain the values (i.e. the numbers) of the factor variable. You can learn more about that in this tutorial.

However, let’s check the classes of our columns again to see how our data has changed:

sapply(data, class)                           # Get classes of all columns
#        x1          x2          x3 
# "numeric" "character"   "integer"

As we wanted: The factor column was converted to numeric.

If you need more explanation on the R syntax of Example 1, you might have a look at the following YouTube video. In the video, I’m explaining the previous R programming code in some more detail:

 

 

Example 2: Change Multiple Columns to Numeric

In Example 1 we used the as.numeric and the as.character functions to modify one variable of our example data. However, when we want to change several variables to numeric simultaneously, the approach of Example 1 might be too slow (i.e. too much programming). In this example, I’m therefore going to show you how to change as many columns as you want at the same time.

First, we need to specify which columns we want to modify. In this example, we are converting columns 2 and 3 (i.e. the character string and the integer):

i <- c(2, 3)                                  # Specify columns you want to change

We can now use the apply function to change columns 2 and 3 to numeric:

data[ , i] <- apply(data[ , i], 2,            # Specify own function within apply
                    function(x) as.numeric(as.character(x)))

Let’s check the classes of the variables of our data frame:

sapply(data, class)                           # Get classes of all columns
#        x1        x2        x3 
# "numeric" "numeric" "numeric"

The whole data frame was converted to numeric!

 

Further Resources

Converting variable classes in R is a complex topic. I have therefore listed some additional resources about the Modification of R data classes in the following.

If you want to learn more about the basic data types in R, I can recommend the following video of the Data Camp YouTube channel:

 

 

Also, you could have a look at the following R tutorials of this homepage:

I hope you liked this tutorial! Let me know in the comments if you have any further questions and of cause I am also happy about general feedback.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


34 Comments. Leave new

  • Julio Alfonso Chia Wong
    September 15, 2019 11:35 pm

    Excellent tutorial, it helped me a lot!

    Reply
  • Tarequzzaman
    April 30, 2020 2:28 pm

    data[ , i] <- apply(data[ , i], 2, # Specify own function within apply
    function(x) as.numeric(as.character(x)))
    what does this "2" means and why we use it ?? Please explain.

    Reply
  • You saved me at the night before exam

    Reply
  • Best Tutorial on R . Please upload some more videos of this kind . Appreciates and best wishes

    Reply
    • Thanks a lot for this awesome feedback Joshy! I’ll definitely upload more videos like that 🙂

      Regards

      Joachim

      Reply
      • I have breast cancer data, from the TCGA, however when I uploaded it and try to read it always giving me the data are characters not numeric, the data is huge, so how can I solve this, how can I take the genes of my interest in and let the others?

        Reply
        • Hi Ali,

          I’m sorry for the delayed reply. I was on a long vacation, so unfortunately I wasn’t able to get back to you earlier. Do you still need help with your syntax?

          Regards,
          Joachim

          Reply
  • This is working the variables are numeric now, but I still have a problem, some values are turned to NA

    Reply
  • I uploaded some files that I found on the internet. It is the historical data of some companies, this is a school project, the project is to optimize the investment portfolio and see how the numbers of the companies develop and which of all is the best option. Sorry for writing so much; but I wanted to make it clear in context.
    The columns of these files have a class of “character” which makes it difficult to do something .. So I took on the task of changing the class of the columns. I leave you here the code that I used. it happened that many values ​​were deleted. And now I don’t even know how to return the file to how it was before.

    Reply
  • what if you had x1 – x2000 , and in that range you had 400 random columns you wanted to convert to numeric. Is there a way to do the conversion without having to manually enter each of the 400 columns in a vector?

    Reply
  • Hey Frank

    How do you convert X1-X2000 columns to numeric at once?

    Thanks

    G.

    Reply
  • Ana Cecilia Ramirez Licon
    September 5, 2022 8:12 pm

    Hello, Joachim.

    I really liked this tutorial. Quick question, is there anyway I can use the for loop to convert columns in data.frame to numeric? Here is the code I have been trying to use, using your data.frame example:

    i <- (2,3) #to establish the columns I want to change.

    for(i in data[,i]){
    if(is.integer(data[,i])
    as.numeric(as.character(data[,i]))}

    I have been trying with different variations of this code but everything marks an error. If you could tell me what am I doing wrong, I would really appreciate it. Thank you and have a nice day.

    Reply
    • Hey Ana,

      Thank you for the kind words, glad you liked the tutorial.

      Yes, this is possible. Please have a look at the following example code:

      for(i in 1:ncol(data)) {
        if(is.integer(data[ , i])) {
          data[ , i] <- as.numeric(as.character(data[ , i]))
        }
      }

      Regards,
      Joachim

      Reply
  • Hi Joaquim,

    thanks for this tutorial 🙂 it worked fine with my data.
    one question: all values are rounded (ie: 101.2179 is now 101).
    is there a way to keep the original format?

    thanks in advance.

    Reply
  • Hi Cansu,
    Thanks for replying. I realised it had to do with the visualization of the console in R. When I downloaded the file, data maintained the decimals.
    Thanks again and I hope you have a great week.
    Cheers.

    Reply
  • Hello, thank you for the helpful information!
    I have a dataset called “a” and variable called “cancer_num”. The variable cancer_num is indexed in column number 113.
    However, when I tried to run the following 2 codes, it gave different results
    class(a$cancer_num)
    class(a [, 113])
    The first one returns “numeric”, and the second returns “tbl_df”.
    I am sure that the index number of the cancer_num variable is correct as 113.
    I tried to check with other variables as well, and they also gave different results. If I use the first syntax, they return correctly as numeric, factor, etc. However, the second syntax always returns “tbl_df”.
    Any idea why they give different results?
    Thank you!

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top