Monday, December 1, 2008

Flogging placement data

Here's an illustration on transforming data for proportion data.

Every summer, a math placement test is given to over 3000 entering freshmen.  The score on this test is used to determine which math course they are allowed to take in the fall.

The score on the placement test is a course number that indicates that the student can take that course in the fall.  Here are the counts of the placement scores for the freshmen in the last six years:

     2003 2004 2005 2006 2007 2008
131H   33   54   51   57   44   41
131   196  364  342  361  320  251
130   192  245  236  248  211  208
128   557  707  647  700  603  618
126   428  518  489  480  442  428
122   501  612  580  565  464  498
215   661  912  888  838  792  747
112   207  216  230  208  212  180
095   418  524  545  419  407  335
090    46   78   89   58   62   46

How can we compare the scores for different years?  A first step to compute percentages of each column.

     2003 2004 2005 2006 2007 2008
131H  1.0  1.3  1.2  1.4  1.2  1.2
131   6.1  8.6  8.3  9.2  9.0  7.5
130   5.9  5.8  5.8  6.3  5.9  6.2
128  17.2 16.7 15.8 17.8 17.0 18.4
126  13.2 12.2 11.9 12.2 12.4 12.8
122  15.5 14.5 14.2 14.4 13.0 14.9
215  20.4 21.6 21.7 21.3 22.3 22.3
112   6.4  5.1  5.6  5.3  6.0  5.4
095  12.9 12.4 13.3 10.7 11.4 10.0
090   1.4  1.8  2.2  1.5  1.7  1.4

We see in 2003 that 6.1% of the students placed in MATH 131 and 20.4% placed in MATH 215.

Let's follow Tukey's strategy for comparing percentage vectors.  To make this simple to explain, let's focus on comparing the percentages of 2006 and 2007.

     2006 2007
131H  1.4  1.2
131   9.2  9.0
130   6.3  5.9
128  17.8 17.0
126  12.2 12.4
122  14.4 13.0
215  21.3 22.3
112   5.3  6.0
095  10.7 11.4
090   1.5  1.7

1.  First we cut the data by some row.  Let's try cutting the data after the second row.

     2006 2007
131H  1.4  1.2
131   9.2  9.0
---------------
130   6.3  5.9
128  17.8 17.0
126  12.2 12.4
122  14.4 13.0
215  21.3 22.3
112   5.3  6.0
095  10.7 11.4
090   1.5  1.7

2.  We compute a folded log for each year.  For 2006, we see that 1.4 + 9.2 = 10.6% are above the line and 100 - 10.6 = 89.4% are below the line, so the flog is

FLOG for 2006 = log(10.6/89.4) = -2.13

Likewise the flog for 2007 is given by

FLOG for 2007 = log(10.2/89.8) = -2.18

3.  To compare the years 2006 and 2007, we look at the difference in flogs:

Change in FLOG from 2006 to 2007 is -2.18 - (-2.13) = -0.05

The interpretation is that students did 0.05 worse in 2007 (on the flog scale).

What if we cut the table by a different row?  We can repeat the procedure using all possible cuts.

Here is the table of flogs:

       2006  2007
 [1,] -4.22 -4.38
 [2,] -2.13 -2.17
 [3,] -1.59 -1.65
 [4,] -0.63 -0.70
 [5,] -0.12 -0.18
 [6,]  0.46  0.35
 [7,]  1.56  1.44
 [8,]  1.98  1.88
 [9,]  4.20  4.03

To understand these values, -4.22 is the flog if we cut after the first row, -2.13 is the flog if we cut after the second row, etc.

To compare the years 2006 and 2007, we look at the difference in flogs:

     2007 FLOG - 2006 FLOG
 [1,] -0.16
 [2,] -0.04
 [3,] -0.06
 [4,] -0.07
 [5,] -0.06
 [6,] -0.11
 [7,] -0.12
 [8,] -0.10
 [9,] -0.17

What have we learned?  Note that all of the flog differences are negative and the median flog difference is -0.10.  So it is clear the 2007 students did a little worse than the 2006 students.



No comments: