R Warning Message: NAs Introduced by Coercion (Example)
This article explains how to debug the warning message “NAs introduced by coercion” in the R programming language.
The content of the post is structured as follows:
Let’s dive into it…
Creation of Example Data
First, I’ll have to create some example data.
vec <- c("50", "200", "1,000", "10", "1200", "2,100") # Create example vector vec # Print example vector # [1] "50" "200" "1,000" "10" "1200" "2,100"
Have a look at the previous RStudio console output. It shows that our example data is a vector of character strings containing six vector elements.
Example 1: Reproduce the Warning Message: NAs Introduced by Coercion
In this example, I’ll show how to replicate the warning message “NAs introduced by coercion” when using the as.numeric function in R. Let’s apply the as.numeric function to our example vector:
as.numeric(vec) # Applying as.numeric function # [1] 50 200 NA 10 1200 NA # Warning message: # NAs introduced by coercion
As you can see, the warning message “NAs introduced by coercion” is returned and some output values are NA (i.e. missing data or not available data).
The reason for this is that some of the character strings are not properly formatted numbers and hence cannot be converted to the numeric class.
The next example shows how to solve this problem in R.
Example 2: Modify Data to Avoid Warning Message Using gsub() Function
In Example 2, I’ll illustrate how to handle the as.numeric() warning message “NAs introduced by coercion”.
As explained before, some of our input values are not formatted properly, because they contain commas (i.e. ,) between the numbers. We can remove these commas by using the gsub function:
vec_new <- gsub(",", "", vec) # Applying gsub function vec_new # Print updated example vector # [1] "50" "200" "1000" "10" "1200" "2100"
Have a look at the previous output of the RStudio console. It shows that our updated vector does not contain commas anymore.
Now, let’s apply the as numeric function again:
as.numeric(vec_new) # Applying as.numeric function # [1] 50 200 1000 10 1200 2100
As you can see, we did not only avoid the warning message, we also created an output vector without any NA values.
Example 3: Suppress Warning Message Using suppressWarnings() Function
Sometimes you might not want to convert non-number values to numeric. In this case, you can simply ignore the warning message “NAs introduced by coercion” by wrapping the suppressWarnings function around the as.numeric function:
suppressWarnings(as.numeric(vec)) # Applying suppressWarnings function # [1] 50 200 NA 10 1200 NA
The output is the same as in Example 1, but this time without printing the warning message to the RStudio console.
Video, Further Resources & Summary
Do you want to know more about warnings and errors in R? Then I can recommend watching the following video of my YouTube channel. In the video, I’m explaining the R programming codes of this tutorial in a live programming session.
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.
If you accept this notice, your choice will be saved and the page will refresh.
In addition, you might have a look at the related articles of my homepage. You can find some tutorials about warning and error messages below.
Summary: In this post, I explained how to get rid of the warning “NAs introduced by coercion” when converting a character or factor variable to numeric in the R programming language.
In case you have further questions, don’t hesitate to let me know in the comments section. Furthermore, please subscribe to my email newsletter in order to get updates on new tutorials.
36 Comments. Leave new
Hi Joachim, thanks for this tutorial and your help in advance!
I am having the error NAs introduced by coercion. In my case however I am trying to reformat the string characters (e.g. green, blue, red) from two specific columns so that the characters (e.g. green) is represented by a numeric number (e.g. 1) I’ve been trying a long while now so that I can use the data as part of a neural network but cannot get past this error. Can you help me with this?
Hi Ali,
Thank you for the very kind words and your interesting question. A simple solution might be the following:
Note that the numeric output would be based on the alphabetic order of the input vector.
I hope that helps!
Joachim
Hi Joachim,
Your blog has been a godsend! and I’m hoping you can solve this: My character vector (“1”, “2”, “3”) has zero-width non printing spaces (‘\u200b’)in it (I rvested some Covid data online). I managed to remove most of them with str_remove and then convert the characters into numbers. But it still spits out this message “NAs introduced by coercion” and I don’t know what to look for! Would appreciate any guidance. Thank you!
Hey Angi,
Thanks a lot for this amazing feedback! 🙂
You could use the following code to identify all data cells that are converted to NA:
I hope that helps!
Joachim
Thanks Joachim. The problem was figuring 1) the “invisible” mystery character that was creating the issue. Since I tend to glimpse() at my data rather than head() it, I couldn’t see anything wrong. Till I used head(). 2) The culprit was a zero-width non-printing space that was seemingly immune to str_remove() in its original form “”. But 48 hours later, a good learning experience.
Glad you found a solution Angi, and thanks for sharing it here! I’m sure others will have similar problems and will benefit from your explanation. 🙂
Regards
Joachim
Hi Joachim, thank you but this function gsub(“,”, “”, vec) remove all the comma’s. But what if we had values with comma for example when our vector is like vec <- c("50,1", "200,3", "1,000,5", "10,3", "1200", "2,100")
Hey Ugi,
In this case, you would have to replace the comma by a point. Have a look at the example code below:
Please note that your example vector contained the value “1,000,5” (i.e. two commas). This does not make sense in case the comma is used as a decimal comma.
I hope that helps!
Joachim
Hi Joachim, thank you very much this does work but i have another problem. I try to change a data frame with several columns where the elements are saved as character elements: It looks like this: BAYER DAIMLER DBANK SIEMENS VONOVIA
[1,] “65,24” “47” “6,809” “89,073” “46,34”
[2,] “65,79” “46,91” “6,839” “89,29” “46,6”
[3,] “66,59” “47,92” “7,079” “90,056” “47,69”
If i use this function as.numeric(gsub(“,”, “.”, vec) it does change the character elements in numerical but i loose the structure of this data frame and get a vector with only one column which looks like this:
[1] 65.24 65.79 66.59 66.22 65.78 64.99
Do you how i can change the character elements into numerical but dont loose the complete structure of the data frame?
Hey Ugi,
Please have a look at the following example code. I assume this works for your data as well:
Regards,
Joachim
My code for ANOVA also showed this message, for each row of the dataframe:
There were 22 warnings (use warnings() to see them)
>
> warnings()
Warning messages:
1: In FUN(newX[, i], …) : NAs introduced by coercion
2: In FUN(newX[, i], …) : NAs introduced by coercion
3: In FUN(newX[, i], …) : NAs introduced by coercion
4: In FUN(newX[, i], …) : NAs introduced by coercion
5: In FUN(newX[, i], …) : NAs introduced by coercion
6: In FUN(newX[, i], …) : NAs introduced by coercion
7: In FUN(newX[, i], …) : NAs introduced by coercion
8: In FUN(newX[, i], …) : NAs introduced by coercion
9: In FUN(newX[, i], …) : NAs introduced by coercion
10: In FUN(newX[, i], …) : NAs introduced by coercion
11: In FUN(newX[, i], …) : NAs introduced by coercion
12: In FUN(newX[, i], …) : NAs introduced by coercion
13: In FUN(newX[, i], …) : NAs introduced by coercion
14: In FUN(newX[, i], …) : NAs introduced by coercion
15: In FUN(newX[, i], …) : NAs introduced by coercion
16: In FUN(newX[, i], …) : NAs introduced by coercion
17: In FUN(newX[, i], …) : NAs introduced by coercion
18: In FUN(newX[, i], …) : NAs introduced by coercion
19: In FUN(newX[, i], …) : NAs introduced by coercion
20: In FUN(newX[, i], …) : NAs introduced by coercion
21: In FUN(newX[, i], …) : NAs introduced by coercion
22: In FUN(newX[, i], …) : NAs introduced by coercion
The code is:
#assigning X vector #obtain classifications for samples
Control <-datout.new$clas == "Control"
TB <-datout.new$clas == "TB"
Sarcoidosis <-datout.new$clas == "Sarcoidosis"
#1-factor ANOVA with 3 levels
aov.all.genes <- function(x,s1,s2,s3) {
x1 <- as.numeric(x[s1])
x2 <- as.numeric(x[s2])
x3 <- as.numeric(x[s3])
fac <- c(rep("A",length(x1)), rep("B",length(x2)), rep("C",length(x3)))
a.dat <- data.frame(as.factor(fac),c(x1,x2,x3))
names(a.dat) <- c("factor","express")
p.out <- summary(aov(express~factor, a.dat))[[1]][1,5]
return(p.out) }
aov.run <- apply(datout.new,1, aov.all.genes,s1=Control,s2=Sarcoidosis,s3=TB)
Hey Ira,
It seems like your variables Control, Sarcoidosis, and TB are not formatted properly. Could you illustrate how these variables look like?
Regards,
Joachim
Hi Joachim, ur blog is amazing!
I’m hoping that you can solve this: My character factor W is in the data set of XYZ.
W has value such as “1”, “2”, “3”, “101-200”, “101-200″,”101-200”,NA, NA. I have tried: as.integer(as.factor(XYZ$W)) and also
as.integer(as.factor(W))
However, when I determine if I have changed the W character to integer, so I check it with both:
as.integer(W) –> TRUE
as.integer(XYZ$W) –> FALSE
But, when i type in: str(XYZ), it shows the W is still in character form.
Could you help me look over how to change the factor to integer?
I would really appreciate if you can help me with this. Thank you!
Hey Pinky,
First of all, thanks a lot for the very kind words! Glad you like my tutorials! 🙂
Regarding your question, please have a look at the code below. First we have to create some example data:
Next, we can convert these data to numeric using the as.numeric function:
Note that the previous code has replaced “101-200” by NA, since this character string cannot be represented as a numeric (or integer value).
If you want to avoid this, you may insert an average value for these strings:
In the previous code, I have replaced “101-200” by 150.
I hope that helps!
Joachim
Hi Joachim, first I’ll like to thank you so much for the help and amazing work you’re doing right here. I’ll need you to please help convert a column from my dataset containing time duration in format “hms” to numeric. Here’s a view of the column
glimpse(Annual_Trips$ride_length)
‘hms’ num [1:5595063] 00:10:25 00:04:04 00:01:20 00:11:42 …
– attr(*, “units”)= chr “secs”
Tried applying the gsub function and taught of replacing the ‘,’ with the ‘:’ but still got a column full of NAs.
Thanks in advance
Hey Pitch,
Thank you so much for the great feedback, glad you like my tutorials! 🙂
Also, thanks for the interesting question. It has inspired me top create a new tutorial on how to convert hours, minutes, and seconds to a numeric seconds object.
Please have a look here.
I hope that helps!
Joachim
Hi Joachim thanks for this tutorial.
Followed the steps and it worked perfectly. Thank you very much
You are very welcome, glad it worked! 🙂
Hi Joachim,
I’m new here so I will be very happy if you can help me with my data.
I try to convert my variable “Year”, that is a character to numeric. This is my complete code:
FD = read_csv(“FAO_Fishingdata_2021.csv”)
FD2 %
pivot_longer(cols = 4:72, names_to = “Year”, values_to = “Catch”) %>%
mutate(Year = as.numeric(“Year”))
But as a result I get a collum “Year” with all NAs…
What am I doing wrong?
Thank you!
Hey Danijela,
Could you please share some example values that are contained in the Year column?
Regards,
Joachim
Hi Joachim,
Thank you so much for your tutorials! Hope you can help me with my problem!
I have pH values from some samples – but a lot of NA’s as well. I would like to tell R that it is numeric values, but when I use the as.numeric function I get the warning: nas introduced by coercion. I loos all the numbers as it convert it to NA’s
[1] NA NA NA 6.51 6.18 NA 6.43 6.73 NA 6.56 6.02 NA NA 7.31 NA 6.56 6.17 NA 7.31 6.78
[21] NA 5.25 5.44 NA NA NA
d$kat_urinph <- as.numeric(d$kat_urinph)
Warning message:
NAs introduced by coercion
Hey Annemarie,
Thank you very much for the kind feedback, glad you like my tutorials!
Regarding your question, I can convert the values you have posted above to numeric without having any problems. See the example code below:
Are there maybe any other values in your data that are converted differently?
Regards,
Joachim
Hi Joachim,
thank you for your work and all of your tutorials, they always help a lot.
I am currently experiencing the same problem with the NA introduced by coercion in my dataset.
I already figured out that there must be a problem with the format of the negative values in my dataset, however, I haven’t found a solution yet to solve this issue.
This is a little example of the values:
EBIT_dax$`2020`
[1] “739000.00” “10603000.00” “-1055000.00” “5481000.00” “-15692000.00” “-772500.00”
And this is how I tried to solve the issue (which didn’t end up working):
EBIT_dax[ ,9] <-apply(EBIT_dax,2, function(x){as.numeric(as.character(gsub("-", "\U2212",x)))})
The then new output shows all negative values as NA.
Do you have a tip for me on how to solve this?
Thank you and regards
Erika
Hi Erika,
Thank you for the kind comment, glad you find the tutorials on Statistics Globe useful!
Regarding your question, you may convert all the columns in your data set to numeric as shown in the following example:
You can find more information on this method here.
Regards,
Joachim
Thank you very much for this tutorial! My script finally is running properly!
Hey Helena,
Thank you very much for the kind comment, glad it helped!
Regards,
Joachim
Hey Joachim, thank you very much for this tutorial; it helped me a lot to understand how these functions work.
I have a similar problem: I am trying to do some spatial analysis with a csv file that has been exported by ArcGIS and contains X and Y coordinates. However when I try this
“`
pottery <- read.csv (file="pots.csv", header=TRUE)
coordinates(pottery) <- ~XCoord+YCoord
“`
I get this error: Error in .local(obj, …) : cannot derive coordinates from non-numeric matrix
When I tried the `as.numeric()` I also got NAs introduced by coercion
My question is: Will r be able to distinguise the variables and columns if I remove the commas (,) from my coordinate columns?
I am also including the dataframe:
“`
str(pottery)
'data.frame': 601 obs. of 14 variables:
$ XCoord : chr "1277.10140000" "1281.93990000" "1309.94460000" "1301.58720000" …
$ YCoord : chr "-915.96560000" "-930.18790000" "-939.57170000" "-931.36080000" …
$ pottery.POINT_X: chr "1277.101400" "1281.939900" "1309.944600" "1301.587200" …
$ pottery.POINT_Y: chr "-915.965600" "-930.187900" "-939.571700" "-931.360800" …
$ Object : chr "P73" "P474" "P587" "P629" …
$ Shape : chr "Amphora" "Jug" "Pithoid" "Pithos" …
$ Use : chr "Storing" "Pouring" "Storing" "Storing" …
$ Height : chr "86.8" "15" "NULL" "68.5" …
$ Fabric : chr "M" "S" "M" "S" …
$ Decoration : chr "Yes" "Yes" "Yes" "Yes" …
$ Pattern : chr "Running Drops" "NULL" "Bands" "Running Drops" …
$ Style : chr "Dark on light" "Painted" "Dark on light" "Dark on light" …
$ Floor : chr "No" "No" "No" "No" …
$ Fill : chr "No" "No" "No" "No" …
“`
Hi Herme,
Thank you for the kind comment, glad you find the tutorial helpful!
I do not have experience with your functions. However, when I try to convert your data to numeric, I do not get this warning:
Maybe it makes sense to identify the non-numeric values in your data first? Have a look here.
Regards,
Joachim
ID age fl lake era
1 1 14 459 Harrison 1977-80
2 2 12 449 Harrison 1977-80
3 3 10 471 Harrison 1977-80
59 59 7 245 Harrison 1997-01
60 60 7 279 Harrison 1997-01
61 61 5 245 Harrison 1997-01
Que : Convert the tmp era values to numeric values.
tmp$era <- as.numeric(tmp$era)
tmp$era
This shows error as it gives NA as the result.
I've tried this :
tmp$era <- tmp
tmp$era[tmp$era == "1977-80"] <- "1978"
tmp$era <- as.numeric(tmp$era1)
tmp$era1
But this gives error :
Error in `$<-.data.frame`(`*tmp*`, era, value = numeric(0)) : replacement has 0 rows, data has 6
Can you please help with this !!
Hi Harsh,
Your column era is not a number, so it cannot be converted to numeric without any preprocessing. Could you please tell me what the desired output for this column should look like?
Regards,
Joachim
I want to convert the test result from “-” and “+” to 0 and 1. And there, empty cell, how can i solve it?
Hey Chimee,
Are the empty cells formatted as blank characters (i.e. “”)? In this case, you may convert them to NA as explained here.
Regards,
Joachim
Hello Jochaim,
thank you for your work and all of your tutorials, they always help a lot.
I have to convert the column “EU27” (in character) from Dataframe (“Data”) into numeric.
Somehow it won’t work because of NAs: Warning: NAs introduced by coercion.
Regards,
Jordan
Hello Jordan,
This error is received due to the wrong formatting of the inputs in character type. The tutorial shows only one possible misformatting. But in the comment section, many other misformattings have already been discussed. You may want to look at them if you haven’t done it yet. If you still struggle maybe you can provide a screenshot of your data for me to check it.
Regards,
Cansu
Hi Joachim,
Thanks for the explanation and everyone for the helpful discussion!
I am getting the “NAs introduced by coercion” warning as well in my pipeline. I noticed that you always get it, even if you explicitly assign NA values (just like in the first example you give).
One would conclude that R does not encourage you to (willingly) create NA in data (even when there is actually missing data).
Is there another way to do this that IS encouraged (i.e. that is considered good coding style and doesn’t give any warnings)?
I don’t think it’s a good idea to bloat up your code with suppressWarnings() left and right X-D
Hello Stefan,
To be honest, I don’t know about the alternatives. But I found this StackOverflow thread which might be useful to check out.
Regards,
Cansu