Friday, November 21, 2008

Binning homework

Some of you appear to be confused on this "binning" homework.  Let's outline what you are supposed to do in this assignment.

1.  First you bin your data and construct a histogram.  This is easy (I hope).  The histogram gives you counts of each bin.

By the way, a rootogram is just like a histogram, but you are graphing the root counts against the bins.  (Why do you do this?  Look at the notes.)

2.  You find a Gaussian curve that fits your data -- you have a mean and a standard deviation.

3.  Now you want to find the expected or fitted counts from the Gaussian curve.  You do this by running the function fit.gaussian -- the inputs to this function are your bins, the vector of raw data, and the mean and standard deviation of the Gaussian curve.   Suppose you save the output of this function into the variable s.  Then

s$counts are the observed bin counts
s$expected are the expected counts

4.  The question at this point is -- does the Gaussian curve provide a good fit to the histogram?  Well, maybe yes, and maybe no.

How can you tell?

You look at the residuals which are essentially the deviations of the counts from the expected counts.

I defined several residuals in the notes.

Simple rootogram residuals are 

r.simple = sqrt(d) - sqrt(e)

These are graphed by use of the rootogram function.

The double root residuals are defined by

DRR = sqrt(2+4d) - sqrt(1+4e).

How do I compute these?  Well, you already have computed the d's and the e's -- you can use R to compute the DRR's.

5.  Once I compute the residuals and graphed them, am I done?  Well, the point of graphing the residuals is to see if they show any systematic pattern -- if there is a pattern, then the Gaussian curve is not good.

6.  Well, we are almost done.  Part 1 (d) asks you to fit a G comparison curve to the root data.  (Boy, your instructor seems to like taking roots.)

All I'm asking here is to FIRST transform the data by a root, and then repeat all of the above with the root data.

7.  Lastly, in #2, I'm asking you to fit a G curve to the women heights.  If you have loaded in the studentdata dataset, then to get the female heights, you type

library(LearnEDA)
data(studentdata)
f.heights=studentdata$Height[studentdata$Gender=="female"]

Good luck!



No comments: