R Warning Message: NAs Introduced by Coercion (Example)

 

This article explains how to debug the warning message “NAs introduced by coercion” in the R programming language.

The content of the post is structured as follows:

Let’s dive into it…

 

Creation of Example Data

First, I’ll have to create some example data.

vec <- c("50", "200", "1,000", "10", "1200", "2,100")  # Create example vector
vec                                                    # Print example vector
# [1] "50"    "200"   "1,000" "10"    "1200"  "2,100"

Have a look at the previous RStudio console output. It shows that our example data is a vector of character strings containing six vector elements.

 

Example 1: Reproduce the Warning Message: NAs Introduced by Coercion

In this example, I’ll show how to replicate the warning message “NAs introduced by coercion” when using the as.numeric function in R. Let’s apply the as.numeric function to our example vector:

as.numeric(vec)                                        # Applying as.numeric function
# [1]   50  200   NA   10 1200   NA
# Warning message:
# NAs introduced by coercion

As you can see, the warning message “NAs introduced by coercion” is returned and some output values are NA (i.e. missing data or not available data).

The reason for this is that some of the character strings are not properly formatted numbers and hence cannot be converted to the numeric class.

The next example shows how to solve this problem in R.

 

Example 2: Modify Data to Avoid Warning Message Using gsub() Function

In Example 2, I’ll illustrate how to handle the as.numeric() warning message “NAs introduced by coercion”.

As explained before, some of our input values are not formatted properly, because they contain commas (i.e. ,) between the numbers. We can remove these commas by using the gsub function:

vec_new <- gsub(",", "", vec)                          # Applying gsub function
vec_new                                                # Print updated example vector
# [1] "50"   "200"  "1000" "10"   "1200" "2100"

Have a look at the previous output of the RStudio console. It shows that our updated vector does not contain commas anymore.

Now, let’s apply the as numeric function again:

as.numeric(vec_new)                                    # Applying as.numeric function
# [1]   50  200 1000   10 1200 2100

As you can see, we did not only avoid the warning message, we also created an output vector without any NA values.

 

Example 3: Suppress Warning Message Using suppressWarnings() Function

Sometimes you might not want to convert non-number values to numeric. In this case, you can simply ignore the warning message “NAs introduced by coercion” by wrapping the suppressWarnings function around the as.numeric function:

suppressWarnings(as.numeric(vec))                      # Applying suppressWarnings function
# [1]   50  200   NA   10 1200   NA

The output is the same as in Example 1, but this time without printing the warning message to the RStudio console.

 

Video, Further Resources & Summary

Do you want to know more about warnings and errors in R? Then I can recommend watching the following video of my YouTube channel. In the video, I’m explaining the R programming codes of this tutorial in a live programming session.

 

Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.

YouTube Content Consent Button Thumbnail

YouTube privacy policy

If you accept this notice, your choice will be saved and the page will refresh.

 

In addition, you might have a look at the related articles of my homepage. You can find some tutorials about warning and error messages below.

 

Summary: In this post, I explained how to get rid of the warning “NAs introduced by coercion” when converting a character or factor variable to numeric in the R programming language.

In case you have further questions, don’t hesitate to let me know in the comments section. Furthermore, please subscribe to my email newsletter in order to get updates on new tutorials.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


18 Comments. Leave new

  • Hi Joachim, thanks for this tutorial and your help in advance!
    I am having the error NAs introduced by coercion. In my case however I am trying to reformat the string characters (e.g. green, blue, red) from two specific columns so that the characters (e.g. green) is represented by a numeric number (e.g. 1) I’ve been trying a long while now so that I can use the data as part of a neural network but cannot get past this error. Can you help me with this?

    Reply
    • Hi Ali,

      Thank you for the very kind words and your interesting question. A simple solution might be the following:

      x <- c("green", "red", "green", "blue", "blue", "yellow")
      x_num <- as.numeric(as.factor(x_fac))
      x_num
      [1] 2 3 2 1 1 4

      Note that the numeric output would be based on the alphabetic order of the input vector.

      I hope that helps!

      Joachim

      Reply
  • Hi Joachim,
    Your blog has been a godsend! and I’m hoping you can solve this: My character vector (“1”, “2”, “3”) has zero-width non printing spaces (‘\u200b’)in it (I rvested some Covid data online). I managed to remove most of them with str_remove and then convert the characters into numbers. But it still spits out this message “NAs introduced by coercion” and I don’t know what to look for! Would appreciate any guidance. Thank you!

    Reply
    • Hey Angi,

      Thanks a lot for this amazing feedback! 🙂

      You could use the following code to identify all data cells that are converted to NA:

      x <- c("1", "2", "a", "3", "b")
      x_test <- as.numeric(x)
      x[is.na(x_test)]
      # [1] "a" "b"

      I hope that helps!

      Joachim

      Reply
      • Thanks Joachim. The problem was figuring 1) the “invisible” mystery character that was creating the issue. Since I tend to glimpse() at my data rather than head() it, I couldn’t see anything wrong. Till I used head(). 2) The culprit was a zero-width non-printing space that was seemingly immune to str_remove() in its original form “”. But 48 hours later, a good learning experience.

        Reply
        • Glad you found a solution Angi, and thanks for sharing it here! I’m sure others will have similar problems and will benefit from your explanation. 🙂

          Regards

          Joachim

          Reply
  • Hi Joachim, thank you but this function gsub(“,”, “”, vec) remove all the comma’s. But what if we had values with comma for example when our vector is like vec <- c("50,1", "200,3", "1,000,5", "10,3", "1200", "2,100")

    Reply
    • Hey Ugi,

      In this case, you would have to replace the comma by a point. Have a look at the example code below:

      vec <- c("50,1", "200,3", "1000,5", "10,3", "1200", "2,100")
      vec_new <- as.numeric(gsub(",", ".", vec))
      vec_new
      # [1]   50.1  200.3 1000.5   10.3 1200.0    2.1

      Please note that your example vector contained the value “1,000,5” (i.e. two commas). This does not make sense in case the comma is used as a decimal comma.

      I hope that helps!

      Joachim

      Reply
  • Hi Joachim, thank you very much this does work but i have another problem. I try to change a data frame with several columns where the elements are saved as character elements: It looks like this: BAYER DAIMLER DBANK SIEMENS VONOVIA
    [1,] “65,24” “47” “6,809” “89,073” “46,34”
    [2,] “65,79” “46,91” “6,839” “89,29” “46,6”
    [3,] “66,59” “47,92” “7,079” “90,056” “47,69”

    If i use this function as.numeric(gsub(“,”, “.”, vec) it does change the character elements in numerical but i loose the structure of this data frame and get a vector with only one column which looks like this:
    [1] 65.24 65.79 66.59 66.22 65.78 64.99

    Do you how i can change the character elements into numerical but dont loose the complete structure of the data frame?

    Reply
    • Hey Ugi,

      Please have a look at the following example code. I assume this works for your data as well:

      data <- data.frame(BAYER = c("65,24", "65,79", "66,59"),
                         DAIMLER = c("47", "46,91", "47,92"))
      data # Example data
      #   BAYER DAIMLER
      # 1 65,24      47
      # 2 65,79   46,91
      # 3 66,59   47,92
       
      data_num <- as.data.frame(apply(data, 2, function(x) as.numeric(gsub(",", ".", x))))
      data_num # Modified data
      #   BAYER DAIMLER
      # 1 65.24   47.00
      # 2 65.79   46.91
      # 3 66.59   47.92

      Regards,
      Joachim

      Reply
  • My code for ANOVA also showed this message, for each row of the dataframe:
    There were 22 warnings (use warnings() to see them)
    >
    > warnings()
    Warning messages:
    1: In FUN(newX[, i], …) : NAs introduced by coercion
    2: In FUN(newX[, i], …) : NAs introduced by coercion
    3: In FUN(newX[, i], …) : NAs introduced by coercion
    4: In FUN(newX[, i], …) : NAs introduced by coercion
    5: In FUN(newX[, i], …) : NAs introduced by coercion
    6: In FUN(newX[, i], …) : NAs introduced by coercion
    7: In FUN(newX[, i], …) : NAs introduced by coercion
    8: In FUN(newX[, i], …) : NAs introduced by coercion
    9: In FUN(newX[, i], …) : NAs introduced by coercion
    10: In FUN(newX[, i], …) : NAs introduced by coercion
    11: In FUN(newX[, i], …) : NAs introduced by coercion
    12: In FUN(newX[, i], …) : NAs introduced by coercion
    13: In FUN(newX[, i], …) : NAs introduced by coercion
    14: In FUN(newX[, i], …) : NAs introduced by coercion
    15: In FUN(newX[, i], …) : NAs introduced by coercion
    16: In FUN(newX[, i], …) : NAs introduced by coercion
    17: In FUN(newX[, i], …) : NAs introduced by coercion
    18: In FUN(newX[, i], …) : NAs introduced by coercion
    19: In FUN(newX[, i], …) : NAs introduced by coercion
    20: In FUN(newX[, i], …) : NAs introduced by coercion
    21: In FUN(newX[, i], …) : NAs introduced by coercion
    22: In FUN(newX[, i], …) : NAs introduced by coercion

    The code is:

    #assigning X vector #obtain classifications for samples

    Control <-datout.new$clas == "Control"
    TB <-datout.new$clas == "TB"
    Sarcoidosis <-datout.new$clas == "Sarcoidosis"

    #1-factor ANOVA with 3 levels
    aov.all.genes <- function(x,s1,s2,s3) {
    x1 <- as.numeric(x[s1])
    x2 <- as.numeric(x[s2])
    x3 <- as.numeric(x[s3])
    fac <- c(rep("A",length(x1)), rep("B",length(x2)), rep("C",length(x3)))
    a.dat <- data.frame(as.factor(fac),c(x1,x2,x3))
    names(a.dat) <- c("factor","express")
    p.out <- summary(aov(express~factor, a.dat))[[1]][1,5]
    return(p.out) }

    aov.run <- apply(datout.new,1, aov.all.genes,s1=Control,s2=Sarcoidosis,s3=TB)

    Reply
    • Hey Ira,

      It seems like your variables Control, Sarcoidosis, and TB are not formatted properly. Could you illustrate how these variables look like?

      Regards,
      Joachim

      Reply
  • Hi Joachim, ur blog is amazing!
    I’m hoping that you can solve this: My character factor W is in the data set of XYZ.
    W has value such as “1”, “2”, “3”, “101-200”, “101-200″,”101-200”,NA, NA. I have tried: as.integer(as.factor(XYZ$W)) and also
    as.integer(as.factor(W))

    However, when I determine if I have changed the W character to integer, so I check it with both:
    as.integer(W) –> TRUE
    as.integer(XYZ$W) –> FALSE

    But, when i type in: str(XYZ), it shows the W is still in character form.

    Could you help me look over how to change the factor to integer?
    I would really appreciate if you can help me with this. Thank you!

    Reply
    • Hey Pinky,

      First of all, thanks a lot for the very kind words! Glad you like my tutorials! 🙂

      Regarding your question, please have a look at the code below. First we have to create some example data:

      x <- c("1", "2", "3", "101-200", "101-200", "101-200", NA, NA)
      x
      # [1] "1"       "2"       "3"       "101-200" "101-200" "101-200" NA        NA

      Next, we can convert these data to numeric using the as.numeric function:

      x_num1 <- as.numeric(x)
      x_num1
      # [1]  1  2  3 NA NA NA NA NA

      Note that the previous code has replaced “101-200” by NA, since this character string cannot be represented as a numeric (or integer value).

      If you want to avoid this, you may insert an average value for these strings:

      x_num2 <- x
      x_num2[x_num2 == "101-200"] <- "150"
      x_num2 <- as.numeric(x_num2)
      x_num2
      # [1]   1   2   3 150 150 150  NA  NA

      In the previous code, I have replaced “101-200” by 150.

      I hope that helps!

      Joachim

      Reply
  • Hi Joachim, first I’ll like to thank you so much for the help and amazing work you’re doing right here. I’ll need you to please help convert a column from my dataset containing time duration in format “hms” to numeric. Here’s a view of the column

    glimpse(Annual_Trips$ride_length)
    ‘hms’ num [1:5595063] 00:10:25 00:04:04 00:01:20 00:11:42 …
    – attr(*, “units”)= chr “secs”

    Tried applying the gsub function and taught of replacing the ‘,’ with the ‘:’ but still got a column full of NAs.
    Thanks in advance

    Reply

Leave a Reply

Your email address will not be published.

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
Top