Monday, February 23, 2009

Reducing Skewness

Do you want to see me quickly reduce skewness?

skewness

More seriously, here are some comments after looking at your homework (the grades are posted and you did generally well on this homework).

1. Why is symmetry important?

Some of you may be wondering why it is useful to make datasets symmetric. I don't think making a dataset symmetric is as important as making datasets have equal spreads, but here are a couple of reasons why symmetry is helpful.

-- Symmetric datasets are simpler to summarize. There is an obvious "average". Also if the data is bell-shaped, one can use the empirical rule (2/3 of the data fall within one standard deviation of the mean).

-- Many statistical procedures like ANOVA assume normally distributed data. One might wish to reexpress data before applying these procedures.

2. Monotone increasing reexpressions.

-- In our definition of power transformations, recall that when p is negative, we consider

"minus data raised to a power"

We did that so that all of the transformations are monotone increasing. So when you have a right-skewed dataset and you move from p=1 to p=0 to p=-1, you should be moving toward symmetry and left-skewness.

3. Skewness in the middle and skewness in the tails.

Sometimes it is tough to make a data symmetric, since there will be different behavior in the middle half and in the tails. You can detect this by a symmetry plot. The points to the left may be close to the line and the points to the right may fall off the line. This indicates that the middle of the data is symmetric, but there is a long tail to the right (or the left).

4. When does reexpression work?

You need sufficient spread in the data as measured by HI/LO. If this ratio is not much different from 1, reexpressions won't help much.

5. Are some of you "R resistant"?

Many of you seem to prefer using Fathom or Minitab. That's okay, but the best way to get comfortable using R is to practice using it. I made the package LearnEDA to make R easier to use, but some of you aren't taking advantage of the special functions.

No comments: