Exploratory Data Analysis: EDA Grading

Most of you are doing great on the homework so far. But I thought I should explain how I great and why you may be losing points.

Generally I am more interested in your explanations and how you are answering the main questions of interest. For example, in the graphs and summaries homework, I am not interested as much in your R work and your computation. Most of you are doing ok in getting R to produce stemplots and compute letter values. But the BIG questions are ...

-- what is the best choice of stemplot?

-- what have we learned about the data in terms of shape, average, and spread?

-- are there observations that deviate from the rest and why are these observations unusual?

You should be addressing these BIG questions in the first R homework.

In the Fathom activity, we were looking at the number of outliers one would expect for samples from different population distributions.

For normal data, we don't see many outliers. But if the data comes from a flat-tailed distribution (like the t distribution), outliers are more common. If this general conclusion wasn't obvious from your work, then you may have lost points.

Here is a final quibble (small point). Most of the stemplots you showed me were hard to read and certainly you wouldn't want to use them for any presentation.

Which stemplot do you prefer?

Stemplot A:

1 | 2: represents 1.2

leaf unit: 0.1

n: 50

2 -1. | 55

6 -1* | 0233

11 -0. | 67899

22 -0* | 01111113344

(9) 0* | 112223334

19 0. | 789999

13 1* | 0011233

6 1. | 68

4 2* | 023

1 2. | 9

Stemplot B:

1 | 2: represents 1.2

leaf unit: 0.1

n: 50

2 -1. | 55

6 -1* | 0233

11 -0. | 67899

22 -0* | 01111113344

(9) 0* | 112223334

19 0. | 789999

13 1* | 0011233

6 1. | 68

4 2* | 023

1 2. | 9

The message here is that you should use a monoproportional font where each character takes the same space like Courier.

Sunday, September 7, 2008

EDA Grading

No comments:

Blog Archive

About Me