Sunday, November 9, 2008

Median Polishing Student Grades


Our department recently was interested in exploring the relationship between student class attendance and performance in a number of 100-level math classes.   We all know that missing math classes will have an adverse effect on grades, but we wanted to learn more about this relationship.

For many students in four 100-level math classes, we collected the student's attendance (percentage of classes attended) and their performance (a percentage where a 90% corresponds to A work, 80-90% a B, etc).   We decided to categorize attendance using the categories 0-50%, 50-70%, 70-90%, over 90%) and then we found the mean performance for students in each categorization of attendance and course.  We got the following two-way table.

          Attendance Pct
COURSE   0-50 50-70 70-90 90-100  
---------------------------------       
MATH 112  61.9  68.0  68.8  75.5
MATH 115  54.3  71.8  76.6  83.3
MATH 122  56.1  64.8  73.1  78.1
MATH 126  50.6  66.3  73.9  78.1

I'll demonstrate median polish with this data.

1.  First, I have to get this data into a matrix form into R.  I'll first use the matrix command to form the matrix (by default, one enters data column by column) and then I'll add row and column labels.

grade=matrix(c(61.9,54.3,56.1,50.6,
               68.0,71.8,64.8,66.3,
               68.8,76.6,73.1,73.9,
               75.5,83.3,78.1,78.1),c(4,4))

dimnames(grade)=list(c("MATH 112","MATH 115","MATH 122","MATH 126"),
                     c("0-50","50-70","70-90","90-100"))

To check that I've read in the data correctly, I'll display "grade":

> grade
         0-50 50-70 70-90 90-100
MATH 112 61.9  68.0  68.8   75.5
MATH 115 54.3  71.8  76.6   83.3
MATH 122 56.1  64.8  73.1   78.1
MATH 126 50.6  66.3  73.9   78.1

2.  Now I can implement median polish to get an additive fit.  I'll store the output in the variable "my.fit" and then I'll display the different components.

> my.fit=medpolish(grade)

Here's the common value.

> my.fit$overall
[1] 69.7

Here are the row effects.

> my.fit$row
MATH 112 MATH 115 MATH 122 MATH 126 
 -0.7375   4.5000   0.2875  -0.2875 

Here are the column effects.

> my.fit$col
     0-50     50-70     70-90    90-100 
-16.35000  -2.75625   2.75625   8.40000 

I'll interpret this additive fit.

  • The average performance of these students is 69.7%.

  • Looking at the row effects, we see that MATH 115 students get grades that are 4.5 - (0.2875) approx 4.2 points higher than MATH 122 students.  MATH 122 students tend to be 0.2875 - (-0.2875) approx 0.57 points higher than MATH 126 students, and MATH 112 students are about a half percentage point lower than MATH 126 students.

  • Looking at the column effects, we see a clear relationship between attendance and performance.  The best (90-100%) attenders do 8.4 - 2.75 = 5.65 points better (on average) than the 70-90 attendance group, do 8.4 - (-2.75) = 11.15 points better than the 50-70 attendance group, and a whopping 8.4 - (-16.35) = 24.75 points better than the "no shows" (the under 50% attending group).
3.  It might be helpful to plot this fit.  I wrote a function plot2way in the LearnEDA package:

> library(LearnEDA)
> plot2way(my.fit$overall+my.fit$row,my.fit$col,dimnames(grade)[[1]],
  dimnames(grade)[[2]])

Note that the plot2way function has four arguments:  the row part, the column part, the vector of names of the rows, and the vector of names of the columns.  Here's the figure.




















Actually, this plot would look better if I had figured out how to rotate the figure so that the FIT lines are horizontal.  (I'll give extra credit points to anyone who can fix my function to do that.)

From this graph we see that the best performances are the MATH 115 students who attend over 90% of the classes; the worst performances are the MATH 112 students who have under 50% attendance.

4.  Are we done?  Not quite.  We have looked at the fit, but have not looked at the residuals -- the differences between the performance and the fit.  They are stored in my.fit$residual -- I will round the values so they are easier to view.

> round(my.fit$residual)
         0-50 50-70 70-90 90-100
MATH 112    9     2    -3     -2
MATH 115   -4     0     0      1
MATH 122    2    -2     0      0
MATH 126   -2     0     2      0

I see one large residual that I have highlighted in red.  MATH 112 students who don't come to class (under 50% attendance) seem to do 9 points better than one would expect based on the additive fit.   This might deserve further study.


1 comment:

eberlaber said...

Casino: Slot Machine Games | DRMCD
In this video we cover 광주 출장안마 the popular games 진주 출장샵 offered by some of 전라북도 출장샵 the largest providers, including Konami, a 안산 출장안마 classic blackjack video 당진 출장샵 game and many more.