data=read.table("president.ages.txt",header=T,sep="\t")
I attach the dataframe to make the variables visible.
attach(data)
Then using the stem.leaf function in the aplstats package, I construct a stemplot of the president ages:
> stem.leaf(Age)
1 | 2: represents 12
leaf unit: 1
n: 44
2 t | 23
f |
6 s | 6677
9 4. | 899
15 5* | 011111
17 t | 22
(9) f | 444445555
18 s | 6667777
11 5. | 8
10 6* | 0111
6 t | 2
5 f | 445
s |
2 6. | 89
What do I see? Here are some things I notice:
- The ages at inauguration seem pretty symmetric shaped about the values of 54 or 55.
- President ages range from 42 to 69. Actually, one has to be a particular age to be president, so I believe that 42 is close to the rule.
- Barrack Obama is one of the youngest presidents in history at 48
I didn't play with the options for stemplot -- I just used the default settings in the stem.leaf function. Could I produce a better stemplot?
Let's break between the tens and ones with five leaves per stem.
> stem.leaf(Age,unit=1,m=2)
1 | 2: represents 12
leaf unit: 1
n: 44
2 4* | 23
9 4. | 6677899
22 5* | 0111112244444
(12) 5. | 555566677778
10 6* | 0111244
3 6. | 589
>
Or I could go the other way and split between the ones and tenths with 10 leaves per stem.
> stem.leaf(Age,unit=.1,m=1)
1 | 2: represents 1.2
leaf unit: 0.1
n: 44
1 42 | 0
2 43 | 0
44 |
45 |
4 46 | 00
6 47 | 00
7 48 | 0
9 49 | 00
10 50 | 0
15 51 | 00000
17 52 | 00
53 |
22 54 | 00000
(4) 55 | 0000
18 56 | 000
15 57 | 0000
11 58 | 0
59 |
10 60 | 0
9 61 | 000
6 62 | 0
63 |
5 64 | 00
3 65 | 0
HI: 68 69
>
I think it is pretty obvious that the first stemplot is the best. If I have too few lines, then I lose some of the structure of the distribution; with too many lines, I don't see any structure at all.
No comments:
Post a Comment