3.1 Contingency tables
A contingency table is called a cross table or two-way table and counts observations for each unique combination of values in two variables. It is a table of data in which the row entries tabulate the data according to one variable and the column entries tabulate the data according to another variable.
The tool best suited for a data set will depend on the variable’s scale of measurement. The most important distinction here is whether a variable is discrete or continuous. If it is discrete, a convenient method to summarise bivariate data is a contingency table. If it is continuous, a scatter diagram can be more useful in visualising the data set.
Contingency tables are used in statistics to understand the relationship between categorical variables. For example, say you want to summarise the following sample of firms in Table 7 regarding their industry sector and size.
Firm | Sector | Employees |
---|---|---|
1 | Technology | <50 |
2 | Food | 50+ |
3 | Technology | <50 |
4 | Food | <50 |
5 | Food | <50 |
6 | Food | 50+ |
7 | Technology | <50 |
8 | Technology | <50 |
9 | Technology | 50+ |
10 | Food | 50+ |
See the following cross table (Table 8), which summarises the information of the sample data. It counts the number of firms for each combination of sector and number of employees.
Sector | <50 Employees | 50+ Employees | Total employees |
---|---|---|---|
Technology | 4 | 0 | 4 |
Food | 2 | 4 | 6 |
Total | 6 | 4 | 10 |
The cross table shows that there are two firms in the food manufacturing sector that have less than 50 employees. However, the cell 50+ shows that there are four firms in the food manufacturing sector that have more than 50 employees. The sum of the total food manufacturing firm is six which is the 60% of the grand total.
Contingency tables vary in size and type because the size of the contingency table depends on the sample size and number of observations.
There is no formula to draw a contingency table in Excel. However, analysts use a PivotTable to build contingency tables. A PivotTable is considered a powerful statistical tool to summarise bivariate and multivariate data sets in an Excel spreadsheet or database table and obtain the desired report. This tool does not actually change the spreadsheet or database itself; it simply pivots or turns the data to view it from different perspectives. Researchers and analysts use PivotTables especially when they have large amounts of data that would be time consuming to calculate by hand. A PivotTable can perform a few data processing functions such as identifying sums, averages, ranges or outliers. It then arranges this information in a simple and meaningful way that draws attention to key values. If you would like to experiment with PivotTables, go to the ‘Insert’ ribbon in Excel and select ‘PivotTable’.