Convert Data Frame Column to Numeric in R (2 Examples) | Change Factor, Character & Integer
In this R tutorial, I’ll explain how to convert a data frame column to numeric in R. No matter if you need to change the class of factors, characters, or integers, this tutorial will show you how to do it.
The article is structured as follows:
- Creation of Example Data in R
- Convert One Column to Numeric (Example 1)
- Convert Multiple Columns to Numeric (Example 2)
- Further Resources for Handling Data Types
Let’s dive right in!
Create Example Data
First we need to create some data in R that we can use in the examples later on:
data <- data.frame(x1 = c(1, 5, 8, 2), # Create example data frame x2 = c(3, 2, 5, 2), x3 = c(2, 7, 1, 2)) data$x1 <- as.factor(data$x1) # First column is a factor data$x2 <- as.character(data$x2) # Second column is a character data$x3 <- as.integer(data$x3) # Third column is an integer data # Print data to RStudio console
You can see the structure of our example data frame in Table 1. The data contains three columns: a factor variable, a character variable, and an integer variable.
Table 1: Example Data Frame with Factor, Character & Integer Variables.
We can check the class of each column of our data table with the sapply function:
sapply(data, class) # Get classes of all columns # x1 x2 x3 # "factor" "character" "integer"
The data is set up, so let’s move on to the examples…
Example 1: Convert One Variable of Data Frame to Numeric
In the first example I’m going to convert only one variable to numeric. For this task, we can use the following R code:
data$x1 <- as.numeric(as.character(data$x1)) # Convert one variable to numeric
Note: The previous code converts our factor variable to character first and then it converts the character to numeric. This is important in order to retain the values (i.e. the numbers) of the factor variable. You can learn more about that in this tutorial.
However, let’s check the classes of our columns again to see how our data has changed:
sapply(data, class) # Get classes of all columns # x1 x2 x3 # "numeric" "character" "integer"
As we wanted: The factor column was converted to numeric.
If you need more explanation on the R syntax of Example 1, you might have a look at the following YouTube video. In the video, I’m explaining the previous R programming code in some more detail:
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
Example 2: Change Multiple Columns to Numeric
In Example 1 we used the as.numeric and the as.character functions to modify one variable of our example data. However, when we want to change several variables to numeric simultaneously, the approach of Example 1 might be too slow (i.e. too much programming). In this example, I’m therefore going to show you how to change as many columns as you want at the same time.
First, we need to specify which columns we want to modify. In this example, we are converting columns 2 and 3 (i.e. the character string and the integer):
i <- c(2, 3) # Specify columns you want to change
We can now use the apply function to change columns 2 and 3 to numeric:
data[ , i] <- apply(data[ , i], 2, # Specify own function within apply function(x) as.numeric(as.character(x)))
Let’s check the classes of the variables of our data frame:
sapply(data, class) # Get classes of all columns # x1 x2 x3 # "numeric" "numeric" "numeric"
The whole data frame was converted to numeric!
Further Resources
Converting variable classes in R is a complex topic. I have therefore listed some additional resources about the Modification of R data classes in the following.
If you want to learn more about the basic data types in R, I can recommend the following video of the Data Camp YouTube channel:
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
Also, you could have a look at the following R tutorials of this homepage:
- How to Convert Factor to Numeric
- How to Convert Character to Numeric
- Convert Factor to Character
- type.convert R Function
- List of Useful R Functions
- The R Programming Language
I hope you liked this tutorial! Let me know in the comments if you have any further questions and of cause I am also happy about general feedback.
Statistics Globe Newsletter
34 Comments. Leave new
Excellent tutorial, it helped me a lot!
Thank you very much! Nice to hear that 🙂
data[ , i] <- apply(data[ , i], 2, # Specify own function within apply
function(x) as.numeric(as.character(x)))
what does this "2" means and why we use it ?? Please explain.
Hi Tarequzzaman,
Thank you for your question. The 2 within the apply function specifies that we want to use the apply function by column. You may also specify a 1 instead, to use the apply function by row.
You can learn more about this topic in the following tutorial: https://statisticsglobe.com/apply-function-to-every-row-of-data-in-r
Regards,
Joachim
Das war auch meine Frage:). Vielen Dank #Joachim für die klare Artikel
Hi Mike,
vielen Dank für den netten Kommentar. Freu mich, dass dir die Antwort ebenfalls geholfen hat!
Viele Grüße
Joachim
You saved me at the night before exam
That’s great to hear, I hope the exam went well! 🙂
Best Tutorial on R . Please upload some more videos of this kind . Appreciates and best wishes
Thanks a lot for this awesome feedback Joshy! I’ll definitely upload more videos like that 🙂
Regards
Joachim
I have breast cancer data, from the TCGA, however when I uploaded it and try to read it always giving me the data are characters not numeric, the data is huge, so how can I solve this, how can I take the genes of my interest in and let the others?
Hi Ali,
I’m sorry for the delayed reply. I was on a long vacation, so unfortunately I wasn’t able to get back to you earlier. Do you still need help with your syntax?
Regards,
Joachim
This is working the variables are numeric now, but I still have a problem, some values are turned to NA
Hey Saleh,
Is the warning message “NAs Introduced by Coercion” returned?
If so, please have a look here: https://statisticsglobe.com/warning-message-nas-introduced-by-coercion-in-r
Regards
Joachim
Thank you! that was helpful.
I used the function gsub to substitute “,” by “.” to overcome the coercion issue.
Thanks a lot for the kind words Saleh, glad you found a solution! 🙂
Hi Joachim,
Could you please answer a situation where we need to keep such characters. For eg, “1990-93”, if such data is there in a column and we cannot omit “-” there.
Hey Prateek,
In this case, it is not possible to use the numeric class. You would have to use the character or factor class instead.
Regards
Joachim
I uploaded some files that I found on the internet. It is the historical data of some companies, this is a school project, the project is to optimize the investment portfolio and see how the numbers of the companies develop and which of all is the best option. Sorry for writing so much; but I wanted to make it clear in context.
The columns of these files have a class of “character” which makes it difficult to do something .. So I took on the task of changing the class of the columns. I leave you here the code that I used. it happened that many values ​​were deleted. And now I don’t even know how to return the file to how it was before.
Hey JR,
I cannot see the code or data you have used. Have you maybe forgotten to include it in your comment?
Regards,
Joachim
what if you had x1 – x2000 , and in that range you had 400 random columns you wanted to convert to numeric. Is there a way to do the conversion without having to manually enter each of the 400 columns in a vector?
Hey Frank,
Please have a look at this tutorial. It shows how to change data types of columns automatically to the appropriate data type.
Regards,
Joachim
Hey Frank
How do you convert X1-X2000 columns to numeric at once?
Thanks
G.
Hey,
I’m not sure who Frank is 😉 However, this should be possible by changing i in Example 2 to 1:2000.
Regards,
Joachim
Hello, Joachim.
I really liked this tutorial. Quick question, is there anyway I can use the for loop to convert columns in data.frame to numeric? Here is the code I have been trying to use, using your data.frame example:
i <- (2,3) #to establish the columns I want to change.
for(i in data[,i]){
if(is.integer(data[,i])
as.numeric(as.character(data[,i]))}
I have been trying with different variations of this code but everything marks an error. If you could tell me what am I doing wrong, I would really appreciate it. Thank you and have a nice day.
Hey Ana,
Thank you for the kind words, glad you liked the tutorial.
Yes, this is possible. Please have a look at the following example code:
Regards,
Joachim
Thank you so much for answering my question. You have helped me a lot. Have a great day.
This is great to hear, glad it helped! 🙂
Regards,
Joachim
Hi Joaquim,
thanks for this tutorial 🙂 it worked fine with my data.
one question: all values are rounded (ie: 101.2179 is now 101).
is there a way to keep the original format?
thanks in advance.
Hello Costanza,
Sorry for the late response. The numbers shouldn’t be rounded. If you haven’t solved your problem yet, could you please share your code? Then I can check it.
Regards,
Cansu
Hi Cansu,
Thanks for replying. I realised it had to do with the visualization of the console in R. When I downloaded the file, data maintained the decimals.
Thanks again and I hope you have a great week.
Cheers.
Hey Constanz,
Perfect! Thank you for the information, it is good to know. Have a good one! 🙂
Regards,
Cansu
Hello, thank you for the helpful information!
I have a dataset called “a” and variable called “cancer_num”. The variable cancer_num is indexed in column number 113.
However, when I tried to run the following 2 codes, it gave different results
class(a$cancer_num)
class(a [, 113])
The first one returns “numeric”, and the second returns “tbl_df”.
I am sure that the index number of the cancer_num variable is correct as 113.
I tried to check with other variables as well, and they also gave different results. If I use the first syntax, they return correctly as numeric, factor, etc. However, the second syntax always returns “tbl_df”.
Any idea why they give different results?
Thank you!
Hello,
Thank you for your question 🙂 It is probably due to the fact that your data class is a tibble instead of a data.frame. You can check this link for more detailed information about tibbles.
Regards,
Cansu