Contingency Table Across Multiple Columns in R (Example)

In this R tutorial you’ll learn how to make a frequency table for multiple columns.

Table of contents:

2) Example: Create Contingency Table Across Multiple Columns

Let’s jump right to the R code.

Creating Example Data

To start with, let’s create some example data:

set.seed(34854467)                      # Create example data
data <- data.frame(x1 = rbinom(9, 1, 0.5),
                   x2 = rbinom(9, 1, 0.5),
                   x3 = rbinom(9, 1, 0.5),
                   group = letters[1:3])
data                                    # Print example data

table 1 data frame contingency table across multiple columns r

Table 1 shows the structure of our example data – It has nine rows and four columns.

Example: Create Contingency Table Across Multiple Columns

This example illustrates how to create a table with counts using multiple data frame variables.

For this task, we can use the t(), sapply(), tapply(), and sum() functions as shown below:

data_count <- t(sapply(data[ , 1:3],    # Create contingency table
                       function(x) tapply(x, data[ , 4], sum)))
data_count                              # Print contingency table

table 2 matrix contingency table across multiple columns r

As shown in Table 2, we have created a table with contingencies across several variables with the previous syntax.

Video, Further Resources & Summary

Have a look at the following video on my YouTube channel. I’m explaining the R programming code of this article in the video:

In addition, you may have a look at the other articles on my homepage:

In this tutorial you have learned how to make a frequency table for multiple variables in R. Don’t hesitate to let me know in the comments section, in case you have additional questions. Furthermore, please subscribe to my email newsletter to receive updates on the newest tutorials.

10 Comments. Leave new

amaru
March 20, 2023 7:29 pm

Hi, nice tutorial.

I have a dataset as this:
millions of rows, and four columns (A,B,C,D).
I want to make contingency tables (2×2) as A[1,1],B[1,2],C,[2,1] and D[2,2]. Later I need to apply a chi-square for each of these contingency tables. I have tried different methods but the p-values vary a lot.
This is an example but it gave me a weird p-value
apply(df[,c(“A” , “B” , “C” , “D”)],
1, function(x) chisq.test(x)[c(‘statistic’, ‘p.value’)] ), p-value for each row (or table2x2) is different than chisq.test(x), where x:
x <- matrix(c(2,2,2,6), byrow = TRUE, 2, 2), these are the four values for the matrix

any help would be highly appreciated.
Thanks

Reply
- Cansu (Statistics Globe)
  March 21, 2023 10:03 am
  
  Hello,
  
  I am afraid I didn’t understand what you would like to implement. So you would like to calculate chi-square for the pairs of A&B, A&C, A&D, B&C, B&D, and C&D. This makes 6 values, but you expect 4. Could you please clarify this?
  
  Regards,
  Cansu
  
  Reply
AMARU
March 21, 2023 11:15 am

Hi Cansu,
Thanks for answering and sorry for not being clear.
I need a chi-square with data from each row: A,B,C,D. Being the matrix in this form:A[1,1],B[1,2],C,[2,1] and D[2,2].

In other words I´d like to do the following for all the rows in my dataframe.
x <- matrix(c(2,2,2,6), byrow = TRUE, 2, 2), these are the four values for the matrix
chisq.test(x)
I hope this is much clear. Thanks

Reply
- Cansu (Statistics Globe)
  March 21, 2023 12:42 pm
  
  Hello,
  
  The chi-square test is computed to calculate the association between two-factor variables. So basically, you need 2 columns to select, each for the one-factor variable. Then chisq.test(data$col1, datacol2) or chisq.test(table(data$col1, datacol2))should give you the result. See this page and this page for a detailed explanation. If you still struggle, please share your data, at least the first few rows, here. Then I can understand where you have the problem.
  
  Regards,
  Cansu
  
  Reply
AMARU
March 21, 2023 1:48 pm

Thanks. Those pages were helpful but didn’t answer my question. This is what my actual data look like:
AC_TOP <- c(2,2,2,6,18)
AC_Bottom <-c(2,2,4,28,3)
AN_Top<- c(2,20,4,4,14)
AN_Bottom <-c(6,10,4,2,19)
total 1 million rows. I need to obtain chi. square for each row. In other words, make 2×2 table for each row to obtain a chi-square for each one. I tried making a matrix and looping each one but it didn’t work.

Reply
- Cansu (Statistics Globe)
  March 21, 2023 3:12 pm
  
  Hello,
  
  I am sorry for asking again, but what are AC_TOP, etc.? Are they your columns/variables? Also, I don’t get very well what you mean by calculating the chi-square for each row or making a contingency table for each row. Could you please elaborate on it?
  
  Regards,
  Cansu
  
  Reply

Amaru

March 21, 2023 3:44 pm

These are names of columns, counts of plants for an specific variant.so the 4 columns represent measures of 4 plants variants in different regions.
I want to to this for the each row:
data <- matrix(c(2,2,2,6)), chisq.test(data),being 2,2,2,6 numbers in the same row of the data shown.

Instead of doing chisq.test(x) for each row manually, I’d like to do it for each row in a single step.

Cansu (Statistics Globe)

March 22, 2023 8:43 am

Hello,

Thank you, now it is more clear. See the following:

data<-data.frame(AC_TOP = c(2,2,2,6,18),
AC_Bottom = c(2,2,4,28,3),
AN_Top = c(2,20,4,4,14),
AN_Bottom = c(6,10,4,2,19))
data
#   AC_TOP AC_Bottom AN_Top AN_Bottom
# 1      2         2      2         6
# 2      2         2     20        10
# 3      2         4      4         4
# 4      6        28      4         2
# 5     18         3     14        19     
 
for(i in 1:nrow(data)){
  print(chisq.test(data[i,]))
 }
 
# Chi-squared test for given probabilities
# 
# data:  data[i, ]
# X-squared = 4, df = 3, p-value = 0.2615
# 
# 
# Chi-squared test for given probabilities
# 
# data:  data[i, ]
# X-squared = 25.765, df = 3, p-value = 1.068e-05
# 
# 
# Chi-squared test for given probabilities
# 
# data:  data[i, ]
# X-squared = 0.85714, df = 3, p-value = 0.8358
# 
# 
# Chi-squared test for given probabilities
# 
# data:  data[i, ]
# X-squared = 44, df = 3, p-value = 1.509e-09
# 
# 
# Chi-squared test for given probabilities
# 
# data:  data[i, ]
# X-squared = 11.926, df = 3, p-value = 0.007641
# 
# Warning messages:
#   1: In chisq.test(data[i, ]) :
#   Chi-Quadrat-Approximation kann inkorrekt sein
# 2: In chisq.test(data[i, ]) :
#   Chi-Quadrat-Approximation kann inkorrekt sein

You see that you receive an error saying that your chi-square calculation might not be correct. This is due to the fact that you have very low counts (rule of thumb lower than 5). In such cases, the results may not be robust. Therefore, trying more robust association tests like Fisher could be a good idea.

Regards,
Cansu

Amaru
March 24, 2023 8:42 pm

Hi, Cansu

Thanks very much for this. I ended up using Fisher test.

Thanks
Alan

Reply
- Cansu (Statistics Globe)
  March 25, 2023 1:42 pm
  
  Welcome!
  
  Regards,
  Cansu
  
  Reply

Contingency Table Across Multiple Columns in R (Example)

Creating Example Data

Example: Create Contingency Table Across Multiple Columns

Video, Further Resources & Summary

10 Comments. Leave new

Leave a Reply Cancel reply

Statistics Globe Newsletter

Related Tutorials

Summarize Multiple Columns of data.table by Group in R (Example)

Mode in R (4 Programming Examples)