Contingency Table Across Multiple Columns in R (Example)

 

In this R tutorial you’ll learn how to make a frequency table for multiple columns.

Table of contents:

Let’s jump right to the R code.

 

Creating Example Data

To start with, let’s create some example data:

set.seed(34854467)                      # Create example data
data <- data.frame(x1 = rbinom(9, 1, 0.5),
                   x2 = rbinom(9, 1, 0.5),
                   x3 = rbinom(9, 1, 0.5),
                   group = letters[1:3])
data                                    # Print example data

 

table 1 data frame contingency table across multiple columns r

 

Table 1 shows the structure of our example data – It has nine rows and four columns.

 

Example: Create Contingency Table Across Multiple Columns

This example illustrates how to create a table with counts using multiple data frame variables.

For this task, we can use the t(), sapply(), tapply(), and sum() functions as shown below:

data_count <- t(sapply(data[ , 1:3],    # Create contingency table
                       function(x) tapply(x, data[ , 4], sum)))
data_count                              # Print contingency table

 

table 2 matrix contingency table across multiple columns r

 

As shown in Table 2, we have created a table with contingencies across several variables with the previous syntax.

 

Video, Further Resources & Summary

Have a look at the following video on my YouTube channel. I’m explaining the R programming code of this article in the video:

 

 

In addition, you may have a look at the other articles on my homepage:

 

In this tutorial you have learned how to make a frequency table for multiple variables in R. Don’t hesitate to let me know in the comments section, in case you have additional questions. Furthermore, please subscribe to my email newsletter to receive updates on the newest tutorials.

 

Subscribe to the Statistics Globe Newsletter

Get regular updates on the latest tutorials, offers & news at Statistics Globe.
I hate spam & you may opt out anytime: Privacy Policy.


10 Comments. Leave new

  • Hi, nice tutorial.

    I have a dataset as this:
    millions of rows, and four columns (A,B,C,D).
    I want to make contingency tables (2×2) as A[1,1],B[1,2],C,[2,1] and D[2,2]. Later I need to apply a chi-square for each of these contingency tables. I have tried different methods but the p-values vary a lot.
    This is an example but it gave me a weird p-value
    apply(df[,c(“A” , “B” , “C” , “D”)],
    1, function(x) chisq.test(x)[c(‘statistic’, ‘p.value’)] ), p-value for each row (or table2x2) is different than chisq.test(x), where x:
    x <- matrix(c(2,2,2,6), byrow = TRUE, 2, 2), these are the four values for the matrix

    any help would be highly appreciated.
    Thanks

    Reply
    • Hello,

      I am afraid I didn’t understand what you would like to implement. So you would like to calculate chi-square for the pairs of A&B, A&C, A&D, B&C, B&D, and C&D. This makes 6 values, but you expect 4. Could you please clarify this?

      Regards,
      Cansu

      Reply
  • Hi Cansu,
    Thanks for answering and sorry for not being clear.
    I need a chi-square with data from each row: A,B,C,D. Being the matrix in this form:A[1,1],B[1,2],C,[2,1] and D[2,2].

    In other words I´d like to do the following for all the rows in my dataframe.
    x <- matrix(c(2,2,2,6), byrow = TRUE, 2, 2), these are the four values for the matrix
    chisq.test(x)
    I hope this is much clear. Thanks

    Reply
    • Hello,

      The chi-square test is computed to calculate the association between two-factor variables. So basically, you need 2 columns to select, each for the one-factor variable. Then chisq.test(data$col1, datacol2) or chisq.test(table(data$col1, datacol2))should give you the result. See this page and this page for a detailed explanation. If you still struggle, please share your data, at least the first few rows, here. Then I can understand where you have the problem.

      Regards,
      Cansu

      Reply
  • Thanks. Those pages were helpful but didn’t answer my question. This is what my actual data look like:
    AC_TOP <- c(2,2,2,6,18)
    AC_Bottom <-c(2,2,4,28,3)
    AN_Top<- c(2,20,4,4,14)
    AN_Bottom <-c(6,10,4,2,19)
    total 1 million rows. I need to obtain chi. square for each row. In other words, make 2×2 table for each row to obtain a chi-square for each one. I tried making a matrix and looping each one but it didn’t work.

    Reply
    • Hello,

      I am sorry for asking again, but what are AC_TOP, etc.? Are they your columns/variables? Also, I don’t get very well what you mean by calculating the chi-square for each row or making a contingency table for each row. Could you please elaborate on it?

      Regards,
      Cansu

      Reply
  • These are names of columns, counts of plants for an specific variant.so the 4 columns represent measures of 4 plants variants in different regions.
    I want to to this for the each row:
    data <- matrix(c(2,2,2,6)), chisq.test(data),being 2,2,2,6 numbers in the same row of the data shown.

    Instead of doing chisq.test(x) for each row manually, I’d like to do it for each row in a single step.

    Reply
    • Hello,

      Thank you, now it is more clear. See the following:

      data<-data.frame(AC_TOP = c(2,2,2,6,18),
      AC_Bottom = c(2,2,4,28,3),
      AN_Top = c(2,20,4,4,14),
      AN_Bottom = c(6,10,4,2,19))
      data
      #   AC_TOP AC_Bottom AN_Top AN_Bottom
      # 1      2         2      2         6
      # 2      2         2     20        10
      # 3      2         4      4         4
      # 4      6        28      4         2
      # 5     18         3     14        19     
       
      for(i in 1:nrow(data)){
        print(chisq.test(data[i,]))
       }
       
      # Chi-squared test for given probabilities
      # 
      # data:  data[i, ]
      # X-squared = 4, df = 3, p-value = 0.2615
      # 
      # 
      # Chi-squared test for given probabilities
      # 
      # data:  data[i, ]
      # X-squared = 25.765, df = 3, p-value = 1.068e-05
      # 
      # 
      # Chi-squared test for given probabilities
      # 
      # data:  data[i, ]
      # X-squared = 0.85714, df = 3, p-value = 0.8358
      # 
      # 
      # Chi-squared test for given probabilities
      # 
      # data:  data[i, ]
      # X-squared = 44, df = 3, p-value = 1.509e-09
      # 
      # 
      # Chi-squared test for given probabilities
      # 
      # data:  data[i, ]
      # X-squared = 11.926, df = 3, p-value = 0.007641
      # 
      # Warning messages:
      #   1: In chisq.test(data[i, ]) :
      #   Chi-Quadrat-Approximation kann inkorrekt sein
      # 2: In chisq.test(data[i, ]) :
      #   Chi-Quadrat-Approximation kann inkorrekt sein

      You see that you receive an error saying that your chi-square calculation might not be correct. This is due to the fact that you have very low counts (rule of thumb lower than 5). In such cases, the results may not be robust. Therefore, trying more robust association tests like Fisher could be a good idea.

      Regards,
      Cansu

      Reply
  • Hi, Cansu

    Thanks very much for this. I ended up using Fisher test.

    Thanks
    Alan

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Top