Exploratory Data Analysis: Simple Statistical Comparisons

I just finished grading your "comparing batches" homework. You did fine in the mechanics (computing the summaries, constructing a spread versus level graph, and deciding on an appropriate reexpression), but it wasn't clear that you understand WHY we are doing all of this work.

Let's review the main points of this section.

1. We wish to compare two batches.

2. What does compare mean? Well, it could mean many things. Batch 1 has a larger spread, batch 2 has three more outliers than batch 2, and so on. You illustrated many type of comparisons in your homework writeups.

Here we wish to compare the general or average locations of the two batches. A simple statistical comparison is something like

"batch 2 is 10 units larger than batch 1"

Maybe we should call this a SSC (statisticians love to use acroynms.)

What this could mean is that if I added 10 units to each value in batch 1, then I would get a new dataset that resembles batch 2.

3. Is it always appropriate to make a SSC?

No.

It won't work if the two batches have unequal spreads. If they have unequal spreads, then adding a number to batch 1 will NOT resemble batch 2.

4. So if the batches have different spreads, we give up?

No.

It is possible that can can reexpress the batches to a new scale, so that the new batches have approximate new scales.

5. So the plan is to (1) try to find a suitable reexpression and (2) do a SSC on the reexpressed data.

The snowfall data example using Fathom is one example where our strategy works. But generally, you did a poor job in making a SCC on the reexpressed data.

Monday, February 9, 2009

Simple Statistical Comparisons

No comments:

Blog Archive

About Me