Exploratory Data Analysis: Last Homework and Dreamtowns

We've been talking about taking power reexpressions of data to achieve particular objectives, such as equalizing spreads between batches or making a batch symmetric.

When do these reexpressions work?

1. First you need to have data with sufficient spread, that we measure by the ratio HI/LO for the power transformations to have much effect.

2. You have to be taking a "significant" power. In the last Fathom homework, most of you tried p = .82 (from a starting value of p = 1) which would typically have little effect on the data. Typically, we try reexpressions in steps of 0.5, so we move from raw to roots to logs, and so on.

This week, we are reexpressing data to achieve approximate symmetry. Here's an example.

Last summer, there was an interesting article posted on bizjournals.com that ranked 140 "dreamtowns" -- these towns offer refuge from big cities and conjested traffice. I got interested in the article since my town, Findlay, made the list.

Anyway, they collected a number of variables from each city, including the percentage of adults (25 and over) who hold college degrees.

Here's a histogram of the these percentages from the 140 towns.

This looks right-skewed with one outlier (I never knew that Bozeman, Montana had a lot of highly educated people) and this is a good candidate for reexpression.

In the notes, I give several methods (plotting the mids, using a symmetry plot, using Hinkley's method) for choosing the "right" rexpression.

I'd suggest that you should try at least 2 of these methods in your homework.

Monday, February 16, 2009

Last Homework and Dreamtowns

No comments:

Blog Archive

About Me