# Contingency Table Across Multiple Columns in R (Example)

In this R tutorial youâ€™ll learn how to make a frequency table for multiple columns.

Letâ€™s jump right to the R code.

## Creating Example Data

```set.seed(34854467)                      # Create example data
data <- data.frame(x1 = rbinom(9, 1, 0.5),
x2 = rbinom(9, 1, 0.5),
x3 = rbinom(9, 1, 0.5),
group = letters[1:3])
data                                    # Print example data```

Table 1 shows the structure of our example data â€“ It has nine rows and four columns.

## Example: Create Contingency Table Across Multiple Columns

This example illustrates how to create a table with counts using multiple data frame variables.

For this task, we can use the t(), sapply(), tapply(), and sum() functions as shown below:

```data_count <- t(sapply(data[ , 1:3],    # Create contingency table
function(x) tapply(x, data[ , 4], sum)))
data_count                              # Print contingency table```

As shown in Table 2, we have created a table with contingencies across several variables with the previous syntax.

## Video, Further Resources & Summary

Have a look at the following video on my YouTube channel. Iâ€™m explaining the R programming code of this article in the video:

In addition, you may have a look at the other articles on my homepage:

Subscribe to the Statistics Globe Newsletter

• Hi, nice tutorial.

I have a dataset as this:
millions of rows, and four columns (A,B,C,D).
I want to make contingency tables (2×2) as A[1,1],B[1,2],C,[2,1] and D[2,2]. Later I need to apply a chi-square for each of these contingency tables. I have tried different methods but the p-values vary a lot.
This is an example but it gave me a weird p-value
apply(df[,c(“A” , “B” , “C” , “D”)],
1, function(x) chisq.test(x)[c(‘statistic’, ‘p.value’)] ), p-value for each row (or table2x2) is different than chisq.test(x), where x:
x <- matrix(c(2,2,2,6), byrow = TRUE, 2, 2), these are the four values for the matrix

any help would be highly appreciated.
Thanks

• Hello,

I am afraid I didn’t understand what you would like to implement. So you would like to calculate chi-square for the pairs of A&B, A&C, A&D, B&C, B&D, and C&D. This makes 6 values, but you expect 4. Could you please clarify this?

Regards,
Cansu

• Hi Cansu,
Thanks for answering and sorry for not being clear.
I need a chi-square with data from each row: A,B,C,D. Being the matrix in this form:A[1,1],B[1,2],C,[2,1] and D[2,2].

In other words IÂ´d like to do the following for all the rows in my dataframe.
x <- matrix(c(2,2,2,6), byrow = TRUE, 2, 2), these are the four values for the matrix
chisq.test(x)
I hope this is much clear. Thanks

• Hello,

The chi-square test is computed to calculate the association between two-factor variables. So basically, you need 2 columns to select, each for the one-factor variable. Then chisq.test(data\$col1, datacol2) or chisq.test(table(data\$col1, datacol2))should give you the result. See this page and this page for a detailed explanation. If you still struggle, please share your data, at least the first few rows, here. Then I can understand where you have the problem.

Regards,
Cansu

• Thanks. Those pages were helpful but didn’t answer my question. This is what my actual data look like:
AC_TOP <- c(2,2,2,6,18)
AC_Bottom <-c(2,2,4,28,3)
AN_Top<- c(2,20,4,4,14)
AN_Bottom <-c(6,10,4,2,19)
total 1 million rows. I need to obtain chi. square for each row. In other words, make 2×2 table for each row to obtain a chi-square for each one. I tried making a matrix and looping each one but it didn’t work.

• Hello,

I am sorry for asking again, but what are AC_TOP, etc.? Are they your columns/variables? Also, I don’t get very well what you mean by calculating the chi-square for each row or making a contingency table for each row. Could you please elaborate on it?

Regards,
Cansu

• These are names of columns, counts of plants for an specific variant.so the 4 columns represent measures of 4 plants variants in different regions.
I want to to this for the each row:
data <- matrix(c(2,2,2,6)), chisq.test(data),being 2,2,2,6 numbers in the same row of the data shown.

Instead of doing chisq.test(x) for each row manually, Iâ€™d like to do it for each row in a single step.

• Hello,

Thank you, now it is more clear. See the following:

```data<-data.frame(AC_TOP = c(2,2,2,6,18),
AC_Bottom = c(2,2,4,28,3),
AN_Top = c(2,20,4,4,14),
AN_Bottom = c(6,10,4,2,19))
data
#   AC_TOP AC_Bottom AN_Top AN_Bottom
# 1      2         2      2         6
# 2      2         2     20        10
# 3      2         4      4         4
# 4      6        28      4         2
# 5     18         3     14        19

for(i in 1:nrow(data)){
print(chisq.test(data[i,]))
}

# Chi-squared test for given probabilities
#
# data:  data[i, ]
# X-squared = 4, df = 3, p-value = 0.2615
#
#
# Chi-squared test for given probabilities
#
# data:  data[i, ]
# X-squared = 25.765, df = 3, p-value = 1.068e-05
#
#
# Chi-squared test for given probabilities
#
# data:  data[i, ]
# X-squared = 0.85714, df = 3, p-value = 0.8358
#
#
# Chi-squared test for given probabilities
#
# data:  data[i, ]
# X-squared = 44, df = 3, p-value = 1.509e-09
#
#
# Chi-squared test for given probabilities
#
# data:  data[i, ]
# X-squared = 11.926, df = 3, p-value = 0.007641
#
# Warning messages:
#   1: In chisq.test(data[i, ]) :
# 2: In chisq.test(data[i, ]) :

You see that you receive an error saying that your chi-square calculation might not be correct. This is due to the fact that you have very low counts (rule of thumb lower than 5). In such cases, the results may not be robust. Therefore, trying more robust association tests like Fisher could be a good idea.

Regards,
Cansu

• Hi, Cansu

Thanks very much for this. I ended up using Fisher test.

Thanks
Alan