Thursday, February 26, 2009

Plotting and Straightening on R

To encourage you to use R for next week's assignment, I put together a new movie showing R in action.

As you probably know, one of the fastest growing companies in the U.S. is Google.  I found an interesting graph showing Google's growth (as measured by the number of employees):

Obviously there has been a substantial increase in Google employees over this two year period.  But we wish to say more.  We'd like to describe the size of this growth.  Also, we'd like to look at the residuals that will detect possible interesting patterns beyond the obvious growth.

Here's the plan.

1.  We start with plotting the data, fitting a resistant line, and looking at the residuals.

2.  By looking at the half-slope ratio and the residuals, we decide if the graph is straight.

3.  If we see curvature in the graph, then we try to reexpress one or more of the variables (by a power transformation) to straighten the graph.  We use half-slope ratios and residual plots to see if we are successful in straightening the graph.

4.  Once the graph is straight, then we summarize the fit (interpret the slope of the resistant line) and examine the residuals.

The key function in the LearnEDA package is rline.

I illustrate using rline in my new movie

The dataset can be found at http://bayes.bgsu.edu/eda/data/google.txt
and my script of R commands for this analysis can be found at

By the way, I'm using a new version of the LearnEDA package that you can download at