Wednesday, April 8, 2009

Binning Baseball Ages

To motivate the issues on the next EDA topic, I collected the ages of all 651 pitchers who played major league baseball during the 2008 season.

I have a vector age that contains the ages in years for these pitchers.

A histogram is a standard way of graphing this data.  Before I graph, I should select reasonable bins; I could use the default selection of bins chosed by the R hist command, but I typically like more control over my graphical displays.

Here I'm interested in the number of players who are each possible age 20, 21, 22, ... etc.  So choose cutpoints 19.5, 20.5, ..., 46.5 that cover 
the range of the data and so there will be no confusion about data falling on bin boundaries.

cutpoints=seq(19.5,46.5)

Now I can use the hist function using the optional breaks argument.

hist(age,breaks=cutpoints)




















What do I see in this display?
  • The shape of the data looks a bit right-skewed.  I'm a little surprised about the number of pitchers who are 40 or older.
  • The most popular ages are 25 and 27 among MLB pitchers.
  • Looking more carefully, it might seem a little odd that we have 79 pitchers of ages 25 and 27, but only 66 pitchers who are age 26.
What is causing this odd behavior in the frequencies for popular ages?  We don't see this behavior for the bins with small counts.

Actually, this "odd behavior" is just an implication of the basic EDA idea that

LARGE COUNTS HAVE LARGER VARIABILITY THAN SMALL COUNTS

So we typically will see this type of behavior whenever we construct a histogram.

When we plot a histogram, it would seem desirable to remove this "variability problem" so it is easier to make comparisons.  For example, when we compare the counts to expected counts assuming a Gaussian model, it will be harder to look at residuals for bins with large counts and bins with small counts since we will have this unequal variability problem.

This discussion motivates the construction of a rootogram and eventually a suspended rootogram to make comparisons with a symmetric curve.

By the way, can we fit a Gaussian curve to our histogram?  On R, we

  • first plot the histogram using the freq=FALSE option -- the vertical scale will be DENSITY rather than COUNTS
  • use the curve command to add a normal curve where the mean and standard deviation are found by the mean and sd of the ages
Here are the commands and the resulting graph.

hist(age,breaks=cutpoints,freq=FALSE)
curve(dnorm(x,mean(age),sd(age)),add=TRUE,col="red",lwd=3)



















It should be clear that a Gaussian curve is not a good model for baseball ages.

40 comments:

LANDY said...

「不可能」這個字詞,在聰明人的字典中是找不到的。........................................

作弊 said...

感謝是愛心的第一步........................................

qusa said...

IT IS A VERY NICE SUGGESTION, THANK YOU LOTS! ........................................

婉婷婉婷 said...

I do like ur article~!!!..................................................

陳志其 said...

怠惰使一切事情都困難,勤勞使一切事情都容易 ..................................................

嘉容嘉容 said...

Virtue is a jewel of great price.......................................................

香君 said...

blog is great~~祝你人氣高高~ .........................................

協盛 said...

nice job! waiting for your new artical.........................................

元維青慈 said...

Extremes meet. 長處即短處。物極必反。否極泰來!一切會更好! ....................................................

佩政 said...

After a storm comes a calm.

韋于倫成 said...

青春一逝不復返,事業一失難有成。........................................

韋于倫成 said...

let us be happey everyday!!........................................

政儒 said...

色情網自拍影片色情文章比基尼成人動畫色瞇瞇影片網小弟貼影片bt成人成人 影片日本成人網站日本成人網站破解日本成人網址日本成人線上免費日本成人免費影片日本成人動畫日本曾根日本有碼 dvd 專賣店日本有碼進口dvd專賣店日本東洋影片視訊 辣妹g8成人下載av短片-免費a片亞亞 dvd 光碟嘿咻kiss168cu成人bt情色 網4u成人0401影音視訊交友愛情館本土自拍xd成人圖區新人淚成人色網kkg亞洲免費影片av影片欣賞性行為補給站999成人性站最愛78論壇最色情的網站最色情的遊戲最多人聊天室最大a片網

嬌潔嬌潔 said...

Venture a small fish to catch a great one. ............................................................

estherme said...

君子如水,隨方就圓,無處不自在。 ..................................................

芸茂 said...

你不能左右天氣,但你可以改變心情..................................................

佳梅 said...

出遊不拘名勝,有景就是好的..............................................................

胤綸胤綸 said...

人因夢想而偉大,要堅持自己的理想哦......................................................................

許紀廷 said...

河水永遠是相同的,可是每一剎那又都是新的。......................................................................

育財 said...

死亡是悲哀的,但活得不快樂更悲哀。......................................................................

皇銘 said...

成熟,就是有能力適應生活中的模糊。.................................................................

嘉玲 said...

向著星球長驅直進的人,反比踟躕在峽路上的人,更容易達到目的。............................................................

雅莊王edgd春2蕙婷余惠其 said...

噴泉的高度,不會超過它的源頭。一個人的事業也是如此,它的成就絕不會超過自己的信念。.................................................................

芸茂芸茂 said...

嗨!很喜歡來這欣賞你的作品,幫你推推推當上人氣王唷..................................................................

JasonBirk佳琪 said...

好的blog需要我們一起努力!............................................................

珍昕珍昕 said...

Many a little makes a mickle.......................................................................

梁淑娟梁淑娟 said...

要經常發表文章 最愛你了呦............................................................

楊儀卉 said...

Practice makes perfect...................................................

盈廖生家秀蔡 said...

要用心經營哦~~祝福你~~
..................................................................

陳璇竹陳璇竹 said...

很棒的分享~~~來留個言囉~~~~.......................................................

RicoLisi0802志竹 said...

祝你人氣百分百 請繼續幫我們加油打氣..................................................

吳家達張怡萱張怡萱 said...

我的痛苦會停止,但求我的心能征服它。................................................

陳林美純易南 said...

You may be only one person in the world, but you may also be the world to one person.............................................................

文王廷 said...

Learn wisdom by the follies of others.............................................................

張黃柏亞武茜 said...

人不能像動物一樣活著,而應該追求知識和美德............................................................

思韓韓韓穎 said...

愛情是一種發明,需要不斷改良。只是,這種發明和其他發明不一樣,它沒有專利權,隨時會被人搶走。.................................................................

涛子 said...

如果成為一支火柴,也要點亮一個短暫的宇宙;如果是一隻烏鴉,也要叫疼閉塞的耳膜。............................. ...................................

靜蔡蔡蔡蔡怡 said...

君子立恆志,小人恆立志。................. ................................................

瑰潼 said...

到處盡心,即為快事;舉步踏實,便是坦途。................. ................................................

盛春成 said...

培養健全孩子最好的方法是父母先成為健全的人。......................................................................