R Warning Message: NAs Introduced by Coercion (Example)

 

This article explains how to debug the warning message “NAs introduced by coercion” in the R programming language.

The content of the post is structured as follows:

Let’s dive into it…

 

Creation of Example Data

First, I’ll have to create some example data.

vec <- c("50", "200", "1,000", "10", "1200", "2,100")  # Create example vector
vec                                                    # Print example vector
# [1] "50"    "200"   "1,000" "10"    "1200"  "2,100"

Have a look at the previous RStudio console output. It shows that our example data is a vector of character strings containing six vector elements.

 

Example 1: Reproduce the Warning Message: NAs Introduced by Coercion

In this example, I’ll show how to replicate the warning message “NAs introduced by coercion” when using the as.numeric function in R. Let’s apply the as.numeric function to our example vector:

as.numeric(vec)                                        # Applying as.numeric function
# [1]   50  200   NA   10 1200   NA
# Warning message:
# NAs introduced by coercion

As you can see, the warning message “NAs introduced by coercion” is returned and some output values are NA (i.e. missing data or not available data).

The reason for this is that some of the character strings are not properly formatted numbers and hence cannot be converted to the numeric class.

The next example shows how to solve this problem in R.

 

Example 2: Modify Data to Avoid Warning Message Using gsub() Function

In Example 2, I’ll illustrate how to handle the as.numeric() warning message “NAs introduced by coercion”.

As explained before, some of our input values are not formatted properly, because they contain commas (i.e. ,) between the numbers. We can remove these commas by using the gsub function:

vec_new <- gsub(",", "", vec)                          # Applying gsub function
vec_new                                                # Print updated example vector
# [1] "50"   "200"  "1000" "10"   "1200" "2100"

Have a look at the previous output of the RStudio console. It shows that our updated vector does not contain commas anymore.

Now, let’s apply the as numeric function again:

as.numeric(vec_new)                                    # Applying as.numeric function
# [1]   50  200 1000   10 1200 2100

As you can see, we did not only avoid the warning message, we also created an output vector without any NA values.

 

Example 3: Suppress Warning Message Using suppressWarnings() Function

Sometimes you might not want to convert non-number values to numeric. In this case, you can simply ignore the warning message “NAs introduced by coercion” by wrapping the suppressWarnings function around the as.numeric function:

suppressWarnings(as.numeric(vec))                      # Applying suppressWarnings function
# [1]   50  200   NA   10 1200   NA

The output is the same as in Example 1, but this time without printing the warning message to the RStudio console.

 

Video, Further Resources & Summary

Do you want to know more about warnings and errors in R? Then I can recommend watching the following video of my YouTube channel. In the video, I’m explaining the R programming codes of this tutorial in a live programming session.

 

 

In addition, you might have a look at the related articles of my homepage. You can find some tutorials about warning and error messages below.

 

Summary: In this post, I explained how to get rid of the warning “NAs introduced by coercion” when converting a character or factor variable to numeric in the R programming language.

In case you have further questions, don’t hesitate to let me know in the comments section. Furthermore, please subscribe to my email newsletter in order to get updates on new tutorials.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


40 Comments. Leave new

  • Hi Joachim, thanks for this tutorial and your help in advance!
    I am having the error NAs introduced by coercion. In my case however I am trying to reformat the string characters (e.g. green, blue, red) from two specific columns so that the characters (e.g. green) is represented by a numeric number (e.g. 1) I’ve been trying a long while now so that I can use the data as part of a neural network but cannot get past this error. Can you help me with this?

    Reply
    • Hi Ali,

      Thank you for the very kind words and your interesting question. A simple solution might be the following:

      x <- c("green", "red", "green", "blue", "blue", "yellow")
      x_num <- as.numeric(as.factor(x_fac))
      x_num
      [1] 2 3 2 1 1 4

      Note that the numeric output would be based on the alphabetic order of the input vector.

      I hope that helps!

      Joachim

      Reply
  • Hi Joachim,
    Your blog has been a godsend! and I’m hoping you can solve this: My character vector (“1”, “2”, “3”) has zero-width non printing spaces (‘\u200b’)in it (I rvested some Covid data online). I managed to remove most of them with str_remove and then convert the characters into numbers. But it still spits out this message “NAs introduced by coercion” and I don’t know what to look for! Would appreciate any guidance. Thank you!

    Reply
    • Hey Angi,

      Thanks a lot for this amazing feedback! 🙂

      You could use the following code to identify all data cells that are converted to NA:

      x <- c("1", "2", "a", "3", "b")
      x_test <- as.numeric(x)
      x[is.na(x_test)]
      # [1] "a" "b"

      I hope that helps!

      Joachim

      Reply
      • Thanks Joachim. The problem was figuring 1) the “invisible” mystery character that was creating the issue. Since I tend to glimpse() at my data rather than head() it, I couldn’t see anything wrong. Till I used head(). 2) The culprit was a zero-width non-printing space that was seemingly immune to str_remove() in its original form “”. But 48 hours later, a good learning experience.

        Reply
        • Glad you found a solution Angi, and thanks for sharing it here! I’m sure others will have similar problems and will benefit from your explanation. 🙂

          Regards

          Joachim

          Reply
  • Hi Joachim, thank you but this function gsub(“,”, “”, vec) remove all the comma’s. But what if we had values with comma for example when our vector is like vec <- c("50,1", "200,3", "1,000,5", "10,3", "1200", "2,100")

    Reply
    • Hey Ugi,

      In this case, you would have to replace the comma by a point. Have a look at the example code below:

      vec <- c("50,1", "200,3", "1000,5", "10,3", "1200", "2,100")
      vec_new <- as.numeric(gsub(",", "\\.", vec))
      vec_new
      # [1]   50.1  200.3 1000.5   10.3 1200.0    2.1

      Please note that your example vector contained the value “1,000,5” (i.e. two commas). This does not make sense in case the comma is used as a decimal comma.

      I hope that helps!

      Joachim

      Reply
  • Hi Joachim, thank you very much this does work but i have another problem. I try to change a data frame with several columns where the elements are saved as character elements: It looks like this: BAYER DAIMLER DBANK SIEMENS VONOVIA
    [1,] “65,24” “47” “6,809” “89,073” “46,34”
    [2,] “65,79” “46,91” “6,839” “89,29” “46,6”
    [3,] “66,59” “47,92” “7,079” “90,056” “47,69”

    If i use this function as.numeric(gsub(“,”, “.”, vec) it does change the character elements in numerical but i loose the structure of this data frame and get a vector with only one column which looks like this:
    [1] 65.24 65.79 66.59 66.22 65.78 64.99

    Do you how i can change the character elements into numerical but dont loose the complete structure of the data frame?

    Reply
    • Hey Ugi,

      Please have a look at the following example code. I assume this works for your data as well:

      data <- data.frame(BAYER = c("65,24", "65,79", "66,59"),
                         DAIMLER = c("47", "46,91", "47,92"))
      data # Example data
      #   BAYER DAIMLER
      # 1 65,24      47
      # 2 65,79   46,91
      # 3 66,59   47,92
       
      data_num <- as.data.frame(apply(data, 2, function(x) as.numeric(gsub(",", ".", x))))
      data_num # Modified data
      #   BAYER DAIMLER
      # 1 65.24   47.00
      # 2 65.79   46.91
      # 3 66.59   47.92

      Regards,
      Joachim

      Reply
  • My code for ANOVA also showed this message, for each row of the dataframe:
    There were 22 warnings (use warnings() to see them)
    >
    > warnings()
    Warning messages:
    1: In FUN(newX[, i], …) : NAs introduced by coercion
    2: In FUN(newX[, i], …) : NAs introduced by coercion
    3: In FUN(newX[, i], …) : NAs introduced by coercion
    4: In FUN(newX[, i], …) : NAs introduced by coercion
    5: In FUN(newX[, i], …) : NAs introduced by coercion
    6: In FUN(newX[, i], …) : NAs introduced by coercion
    7: In FUN(newX[, i], …) : NAs introduced by coercion
    8: In FUN(newX[, i], …) : NAs introduced by coercion
    9: In FUN(newX[, i], …) : NAs introduced by coercion
    10: In FUN(newX[, i], …) : NAs introduced by coercion
    11: In FUN(newX[, i], …) : NAs introduced by coercion
    12: In FUN(newX[, i], …) : NAs introduced by coercion
    13: In FUN(newX[, i], …) : NAs introduced by coercion
    14: In FUN(newX[, i], …) : NAs introduced by coercion
    15: In FUN(newX[, i], …) : NAs introduced by coercion
    16: In FUN(newX[, i], …) : NAs introduced by coercion
    17: In FUN(newX[, i], …) : NAs introduced by coercion
    18: In FUN(newX[, i], …) : NAs introduced by coercion
    19: In FUN(newX[, i], …) : NAs introduced by coercion
    20: In FUN(newX[, i], …) : NAs introduced by coercion
    21: In FUN(newX[, i], …) : NAs introduced by coercion
    22: In FUN(newX[, i], …) : NAs introduced by coercion

    The code is:

    #assigning X vector #obtain classifications for samples

    Control <-datout.new$clas == "Control"
    TB <-datout.new$clas == "TB"
    Sarcoidosis <-datout.new$clas == "Sarcoidosis"

    #1-factor ANOVA with 3 levels
    aov.all.genes <- function(x,s1,s2,s3) {
    x1 <- as.numeric(x[s1])
    x2 <- as.numeric(x[s2])
    x3 <- as.numeric(x[s3])
    fac <- c(rep("A",length(x1)), rep("B",length(x2)), rep("C",length(x3)))
    a.dat <- data.frame(as.factor(fac),c(x1,x2,x3))
    names(a.dat) <- c("factor","express")
    p.out <- summary(aov(express~factor, a.dat))[[1]][1,5]
    return(p.out) }

    aov.run <- apply(datout.new,1, aov.all.genes,s1=Control,s2=Sarcoidosis,s3=TB)

    Reply
    • Hey Ira,

      It seems like your variables Control, Sarcoidosis, and TB are not formatted properly. Could you illustrate how these variables look like?

      Regards,
      Joachim

      Reply
  • Hi Joachim, ur blog is amazing!
    I’m hoping that you can solve this: My character factor W is in the data set of XYZ.
    W has value such as “1”, “2”, “3”, “101-200”, “101-200″,”101-200”,NA, NA. I have tried: as.integer(as.factor(XYZ$W)) and also
    as.integer(as.factor(W))

    However, when I determine if I have changed the W character to integer, so I check it with both:
    as.integer(W) –> TRUE
    as.integer(XYZ$W) –> FALSE

    But, when i type in: str(XYZ), it shows the W is still in character form.

    Could you help me look over how to change the factor to integer?
    I would really appreciate if you can help me with this. Thank you!

    Reply
    • Hey Pinky,

      First of all, thanks a lot for the very kind words! Glad you like my tutorials! 🙂

      Regarding your question, please have a look at the code below. First we have to create some example data:

      x <- c("1", "2", "3", "101-200", "101-200", "101-200", NA, NA)
      x
      # [1] "1"       "2"       "3"       "101-200" "101-200" "101-200" NA        NA

      Next, we can convert these data to numeric using the as.numeric function:

      x_num1 <- as.numeric(x)
      x_num1
      # [1]  1  2  3 NA NA NA NA NA

      Note that the previous code has replaced “101-200” by NA, since this character string cannot be represented as a numeric (or integer value).

      If you want to avoid this, you may insert an average value for these strings:

      x_num2 <- x
      x_num2[x_num2 == "101-200"] <- "150"
      x_num2 <- as.numeric(x_num2)
      x_num2
      # [1]   1   2   3 150 150 150  NA  NA

      In the previous code, I have replaced “101-200” by 150.

      I hope that helps!

      Joachim

      Reply
  • Hi Joachim, first I’ll like to thank you so much for the help and amazing work you’re doing right here. I’ll need you to please help convert a column from my dataset containing time duration in format “hms” to numeric. Here’s a view of the column

    glimpse(Annual_Trips$ride_length)
    ‘hms’ num [1:5595063] 00:10:25 00:04:04 00:01:20 00:11:42 …
    – attr(*, “units”)= chr “secs”

    Tried applying the gsub function and taught of replacing the ‘,’ with the ‘:’ but still got a column full of NAs.
    Thanks in advance

    Reply
  • Hi Joachim,
    I’m new here so I will be very happy if you can help me with my data.

    I try to convert my variable “Year”, that is a character to numeric. This is my complete code:

    FD = read_csv(“FAO_Fishingdata_2021.csv”)

    FD2 %
    pivot_longer(cols = 4:72, names_to = “Year”, values_to = “Catch”) %>%
    mutate(Year = as.numeric(“Year”))

    But as a result I get a collum “Year” with all NAs…
    What am I doing wrong?

    Thank you!

    Reply
  • Hi Joachim,
    Thank you so much for your tutorials! Hope you can help me with my problem!
    I have pH values from some samples – but a lot of NA’s as well. I would like to tell R that it is numeric values, but when I use the as.numeric function I get the warning: nas introduced by coercion. I loos all the numbers as it convert it to NA’s

    [1] NA NA NA 6.51 6.18 NA 6.43 6.73 NA 6.56 6.02 NA NA 7.31 NA 6.56 6.17 NA 7.31 6.78
    [21] NA 5.25 5.44 NA NA NA

    d$kat_urinph <- as.numeric(d$kat_urinph)
    Warning message:
    NAs introduced by coercion

    Reply
    • Hey Annemarie,

      Thank you very much for the kind feedback, glad you like my tutorials!

      Regarding your question, I can convert the values you have posted above to numeric without having any problems. See the example code below:

      x <- c(NA, NA, NA, "6.51", "6.18", NA, "6.43", "6.73", NA, "6.56", "6.02", NA, NA, "7.31", NA, "6.56", "6.17", NA, "7.31", "6.78", NA, "5.25", "5.44", NA, NA, NA)
      x
      #  [1] NA     NA     NA     "6.51" "6.18" NA     "6.43" "6.73" NA     "6.56" "6.02" NA     NA     "7.31" NA     "6.56" "6.17" NA     "7.31" "6.78" NA     "5.25" "5.44" NA     NA     NA    
       
      x_num <- as.numeric(x)
      x_num
      #  [1]   NA   NA   NA 6.51 6.18   NA 6.43 6.73   NA 6.56 6.02   NA   NA 7.31   NA 6.56 6.17   NA 7.31 6.78   NA 5.25 5.44   NA   NA   NA

      Are there maybe any other values in your data that are converted differently?

      Regards,
      Joachim

      Reply
  • Hi Joachim,

    thank you for your work and all of your tutorials, they always help a lot.
    I am currently experiencing the same problem with the NA introduced by coercion in my dataset.
    I already figured out that there must be a problem with the format of the negative values in my dataset, however, I haven’t found a solution yet to solve this issue.
    This is a little example of the values:
    EBIT_dax$`2020`
    [1] “739000.00” “10603000.00” “-1055000.00” “5481000.00” “-15692000.00” “-772500.00”

    And this is how I tried to solve the issue (which didn’t end up working):
    EBIT_dax[ ,9] <-apply(EBIT_dax,2, function(x){as.numeric(as.character(gsub("-", "\U2212",x)))})
    The then new output shows all negative values as NA.

    Do you have a tip for me on how to solve this?

    Thank you and regards
    Erika

    Reply
    • Hi Erika,

      Thank you for the kind comment, glad you find the tutorials on Statistics Globe useful!

      Regarding your question, you may convert all the columns in your data set to numeric as shown in the following example:

      EBIT_dax <- data.frame(x1 = c("739000.00", "10603000.00", "-1055000.00", "5481000.00", "-15692000.00", "-772500.00"),
                             x2 = c("739000.00", "10603000.00", "-1055000.00", "5481000.00", "-15692000.00", "-772500.00"))
       
      EBIT_dax_num <- apply(EBIT_dax, 2,
                            function(x) as.numeric(as.character(x)))
      EBIT_dax_num
      #             x1        x2
      # [1,]    739000    739000
      # [2,]  10603000  10603000
      # [3,]  -1055000  -1055000
      # [4,]   5481000   5481000
      # [5,] -15692000 -15692000
      # [6,]   -772500   -772500

      You can find more information on this method here.

      Regards,
      Joachim

      Reply
  • Thank you very much for this tutorial! My script finally is running properly!

    Reply
  • Hey Joachim, thank you very much for this tutorial; it helped me a lot to understand how these functions work.
    I have a similar problem: I am trying to do some spatial analysis with a csv file that has been exported by ArcGIS and contains X and Y coordinates. However when I try this

    “`
    pottery <- read.csv (file="pots.csv", header=TRUE)
    coordinates(pottery) <- ~XCoord+YCoord
    “`

    I get this error: Error in .local(obj, …) : cannot derive coordinates from non-numeric matrix

    When I tried the `as.numeric()` I also got NAs introduced by coercion

    My question is: Will r be able to distinguise the variables and columns if I remove the commas (,) from my coordinate columns?

    I am also including the dataframe:

    “`
    str(pottery)
    'data.frame': 601 obs. of 14 variables:
    $ XCoord : chr "1277.10140000" "1281.93990000" "1309.94460000" "1301.58720000" …
    $ YCoord : chr "-915.96560000" "-930.18790000" "-939.57170000" "-931.36080000" …
    $ pottery.POINT_X: chr "1277.101400" "1281.939900" "1309.944600" "1301.587200" …
    $ pottery.POINT_Y: chr "-915.965600" "-930.187900" "-939.571700" "-931.360800" …
    $ Object : chr "P73" "P474" "P587" "P629" …
    $ Shape : chr "Amphora" "Jug" "Pithoid" "Pithos" …
    $ Use : chr "Storing" "Pouring" "Storing" "Storing" …
    $ Height : chr "86.8" "15" "NULL" "68.5" …
    $ Fabric : chr "M" "S" "M" "S" …
    $ Decoration : chr "Yes" "Yes" "Yes" "Yes" …
    $ Pattern : chr "Running Drops" "NULL" "Bands" "Running Drops" …
    $ Style : chr "Dark on light" "Painted" "Dark on light" "Dark on light" …
    $ Floor : chr "No" "No" "No" "No" …
    $ Fill : chr "No" "No" "No" "No" …
    “`

    Reply
    • Hi Herme,

      Thank you for the kind comment, glad you find the tutorial helpful!

      I do not have experience with your functions. However, when I try to convert your data to numeric, I do not get this warning:

      x <- c("1277.10140000", "1281.93990000", "1309.94460000")
      as.numeric(x)

      Maybe it makes sense to identify the non-numeric values in your data first? Have a look here.

      Regards,
      Joachim

      Reply
  • ID age fl lake era
    1 1 14 459 Harrison 1977-80
    2 2 12 449 Harrison 1977-80
    3 3 10 471 Harrison 1977-80
    59 59 7 245 Harrison 1997-01
    60 60 7 279 Harrison 1997-01
    61 61 5 245 Harrison 1997-01

    Que : Convert the tmp era values to numeric values.

    tmp$era <- as.numeric(tmp$era)
    tmp$era

    This shows error as it gives NA as the result.

    I've tried this :
    tmp$era <- tmp
    tmp$era[tmp$era == "1977-80"] <- "1978"
    tmp$era <- as.numeric(tmp$era1)
    tmp$era1

    But this gives error :
    Error in `$<-.data.frame`(`*tmp*`, era, value = numeric(0)) : replacement has 0 rows, data has 6

    Can you please help with this !!

    Reply
    • Hi Harsh,

      Your column era is not a number, so it cannot be converted to numeric without any preprocessing. Could you please tell me what the desired output for this column should look like?

      Regards,
      Joachim

      Reply
  • I want to convert the test result from “-” and “+” to 0 and 1. And there, empty cell, how can i solve it?

    Reply
  • Jordan Reitemeyer
    January 11, 2023 2:45 pm

    Hello Jochaim,
    thank you for your work and all of your tutorials, they always help a lot.
    I have to convert the column “EU27” (in character) from Dataframe (“Data”) into numeric.
    Somehow it won’t work because of NAs: Warning: NAs introduced by coercion.
    Regards,
    Jordan

    Reply
    • Hello Jordan,

      This error is received due to the wrong formatting of the inputs in character type. The tutorial shows only one possible misformatting. But in the comment section, many other misformattings have already been discussed. You may want to look at them if you haven’t done it yet. If you still struggle maybe you can provide a screenshot of your data for me to check it.

      Regards,
      Cansu

      Reply
  • Hi Joachim,
    Thanks for the explanation and everyone for the helpful discussion!
    I am getting the “NAs introduced by coercion” warning as well in my pipeline. I noticed that you always get it, even if you explicitly assign NA values (just like in the first example you give).
    One would conclude that R does not encourage you to (willingly) create NA in data (even when there is actually missing data).
    Is there another way to do this that IS encouraged (i.e. that is considered good coding style and doesn’t give any warnings)?
    I don’t think it’s a good idea to bloat up your code with suppressWarnings() left and right X-D

    Reply
  • George Hirons-Alecrim
    November 15, 2023 4:50 pm

    Hi

    im trying to overcome this character issue as i need it to be numeric

    i have a gene count table and trying to get it to be numeric however i’m faced with

    > count_matrix_numeric <- as.numeric(as.character(count_matrix))
    Warning message:
    NAs introduced by coercion

    im not sure how to implement this websites code because i have so many rows and samples on the left column.

    any help would be appreciated.

    kind regards

    george

    Reply
    • Hello George,

      It seems like you’re trying to convert a gene count matrix to a numeric format in R, but you’re encountering issues with NAs being introduced due to coercion. This typically happens when the data being converted contains non-numeric values which cannot be converted to numbers.

      Here’s some suggestions to handle this issue:

      Inspect Your Data: Before conversion, inspect your data to understand why NAs are being introduced. Check for non-numeric values or missing data in your count matrix. You can use functions like str() or summary() to get an overview of your data.

      Data Cleaning: If there are non-numeric values, you’ll need to clean your data. This might involve replacing non-numeric values with a numeric code (e.g., 0) or handling them appropriately based on your data’s context.

      I hope this would help you to detect the problem.

      Best,
      Cansu

      Reply
  • Hi,

    I would like to use the impute_SD function to get missing SD values. For this I prepared a data frame containing columns with missing SD’s (coded as NA) and their complete means. For the column of missing values (coded as NA) I applied the function as.numeric () and gsub () but it still shows “NAs introduced by coercion” and if I then apply the impute_SD () function, it shows “Error in xtfrm.data.frame(x) : cannot xtfrm data frames”. Do ou know how to handle this function and could help? This would be very nice! 🙂

    Thank you in advance!
    Best regards

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top