Wednesday, January 21, 2009

President Ages

As you know, yesterday was a big day in the history of the US as Barack Obama became our 44th president.  In honor of that event, I collected the ages at inauguration and lifespans of all 44 presidents and read this dataset into R.

data=read.table("president.ages.txt",header=T,sep="\t")

I attach the dataframe to make the variables visible.

attach(data)

Then using the stem.leaf function in the aplstats package, I construct a stemplot of the president ages:

> stem.leaf(Age)
1 | 2: represents 12
 leaf unit: 1
            n: 44
   2     t | 23
         f | 
   6     s | 6677
   9    4. | 899
  15    5* | 011111
  17     t | 22
  (9)    f | 444445555
  18     s | 6667777
  11    5. | 8
  10    6* | 0111
   6     t | 2
   5     f | 445
         s | 
   2    6. | 89

What do I see?  Here are some things I notice:

  • The ages at inauguration seem pretty symmetric shaped about the values of 54 or 55.
  • President ages range from 42 to 69.  Actually, one has to be a particular age to be president, so I believe that 42 is close to the rule.
  • Barrack Obama is one of the youngest presidents in history at 48
I didn't play with the options for stemplot -- I just used the default settings in the stem.leaf function.  Could I produce a better stemplot?

Let's break between the tens and ones with five leaves per stem.

> stem.leaf(Age,unit=1,m=2)
1 | 2: represents 12
 leaf unit: 1
            n: 44
    2    4* | 23
    9    4. | 6677899
   22    5* | 0111112244444
  (12)   5. | 555566677778
   10    6* | 0111244
    3    6. | 589

Or I could go the other way and split between the ones and tenths with 10 leaves per stem.

> stem.leaf(Age,unit=.1,m=1)
1 | 2: represents 1.2
 leaf unit: 0.1
            n: 44
   1    42 | 0
   2    43 | 0
        44 | 
        45 | 
   4    46 | 00
   6    47 | 00
   7    48 | 0
   9    49 | 00
  10    50 | 0
  15    51 | 00000
  17    52 | 00
        53 | 
  22    54 | 00000
  (4)   55 | 0000
  18    56 | 000
  15    57 | 0000
  11    58 | 0
        59 | 
  10    60 | 0
   9    61 | 000
   6    62 | 0
        63 | 
   5    64 | 00
   3    65 | 0
HI: 68 69

I think it is pretty obvious that the first stemplot is the best.   If I have too few lines, then I lose some of the structure of the distribution; with too many lines, I don't see any structure at all.




No comments: