R resources for Chapter 2 (Categorical Data)

Videos

Useful commands

You may get categorical data in two forms: raw data, where each row corresponds to a separate case; or a frequency (contingency) table where counts have already been tabulated.

Making tables from raw data

In this section we’ll use the file bbq.csv.  First, load the file:

> bbq <- read.csv("https://raw.githubusercontent.com/brianlukoff/sta309/master/example-data/bbq.csv")

Make a one-way table by region:

> table(bbq$Region)
Midwest Northeast     South      West
     12         9        17        13

Make a two-way contingency table by region and BBQ status:

> table(bbq$Region, bbq$GoodBBQ)
                   no   yes
      Midwest       8     4
      Northeast     6     3
      South         7    10
      West          9     4

(Note: If you have more than two categorical variables, you can also add a third parameter to the table function to make a three-way table.)

We’ll want to use this table again, so we can give it a name:

> bbqtable <- table(bbq$Region, bbq$GoodBBQ)

You can flip what’s on the rows and columns by reversing the order of the variables:

> table(bbq$GoodBBQ, bbq$Region)

     Midwest Northeast South West
  no        8        6     7    9
 yes        4        3    10    4

Alternatively we can transponse bbqtable for the same result:

> t(bbqtable)

If you want to add totals (both row and column) to your table, you can use the addmargins function:

> addmargins(bbqtable)

	    no yes Sum
  Midwest    8   4  12
  Northeast  6   3   9
  South      7  10  17
  West       9   4  13
  Sum       30  21  51

If you instead want only the row totals or only the column totals:

> margin.table(bbqtable,1) # gives the row totals

  Midwest Northeast     South      West 
       12         9        17        13 

> margin.table(bbqtable,2) # gives the column totals

 no yes 
 30  21 

(The part of the commands above after the # sign are comments and won’t be interpreted by R; you don’t have to enter the comments into R. They are there for your reference.)

The prop.table function takes the frequencies in a table and converts them to percentages. The function can be used to obtain the percentages for each cell, row percentages, or column percentages as shown below.

> prop.table(bbqtable) # gives the cell percentages

		    no        yes
  Midwest   0.15686275 0.07843137
  Northeast 0.11764706 0.05882353
  South     0.13725490 0.19607843
  West      0.17647059 0.07843137

> prop.table(bbqtable,1) # gives the row percentages

		   no       yes
  Midwest   0.6666667 0.3333333
  Northeast 0.6666667 0.3333333
  South     0.4117647 0.5882353
  West      0.6923077 0.3076923

> prop.table(bbqtable,2) # gives the column percentages

		   no       yes
  Midwest   0.2666667 0.1904762
  Northeast 0.2000000 0.1428571
  South     0.2333333 0.4761905
  West      0.3000000 0.1904762

You can combine the prop.table and margin.table to get the row and column percentages:

> margin.table(prop.table(bbqtable),1) # gives the row percentages

  Midwest Northeast     South      West 
0.2352941 0.1764706 0.3333333 0.2549020 

> margin.table(prop.table(bbqtable),2) # gives the column the column percentages

       no       yes 
0.5882353 0.4117647 

Making charts

To make a chart of one variable it is easy if you name your one way table and then make a bar chart or pie chart:

region <- table(bbq$Region)
barplot(region)
pie(region)

To create a segmented/stacked bar graphs:

> barplot(bbqtable, legend=T)  # legend=T tells R to provide a legend

Notice that this graph shows the counts. You can instead compare proportions in categories:

> barplot(prop.table(bbqtable,2), legend=T)

To create a mosaic plot where both the width and heights of the bars are proportional to the counts in each category:

> mosaicplot(bbqtable)

You can also traspose the table to switch the rows and columns and add color:

> mosaicplot(t(bbqtable),col=rainbow(4))  

When you have a frequency table to import

In this section, we’ll use the file example.csv.  If you are downloading the data from MyStatLab, you can download the Excel file and then save in CSV format and then define the matrix.

Use the following command (but replace the part in quotes with the path to the file on your computer):

> example <- as.matrix(read.csv("~/Desktop/example.csv", row.names=1))

This reads in the example.csv file as a matrix (table) instead of as a data frame.

> example
          None AA  BA MA PhD
< 1 year    11  4  53 18  11
1-5 years   43  9 122 28  17
> 5 years  103 34  61  3   1

Functions described above that work on tables incuding barplot, addmargins, margin.table, and prop.table can all be used now.