I have a vector age that contains the ages in years for these pitchers.

A histogram is a standard way of graphing this data. Before I graph, I should select reasonable bins; I could use the default selection of bins chosed by the R hist command, but I typically like more control over my graphical displays.

Here I'm interested in the number of players who are each possible age 20, 21, 22, ... etc. So choose cutpoints 19.5, 20.5, ..., 46.5 that cover

the range of the data and so there will be no confusion about data falling on bin boundaries.

cutpoints=seq(19.5,46.5)

Now I can use the hist function using the optional breaks argument.

hist(age,breaks=cutpoints)

What do I see in this display?

- The shape of the data looks a bit right-skewed. I'm a little surprised about the number of pitchers who are 40 or older.
- The most popular ages are 25 and 27 among MLB pitchers.
- Looking more carefully, it might seem a little odd that we have 79 pitchers of ages 25 and 27, but only 66 pitchers who are age 26.

What is causing this odd behavior in the frequencies for popular ages? We don't see this behavior for the bins with small counts.

Actually, this "odd behavior" is just an implication of the basic EDA idea that

LARGE COUNTS HAVE LARGER VARIABILITY THAN SMALL COUNTS

So we typically will see this type of behavior whenever we construct a histogram.

When we plot a histogram, it would seem desirable to remove this "variability problem" so it is easier to make comparisons. For example, when we compare the counts to expected counts assuming a Gaussian model, it will be harder to look at residuals for bins with large counts and bins with small counts since we will have this unequal variability problem.

This discussion motivates the construction of a rootogram and eventually a suspended rootogram to make comparisons with a symmetric curve.

By the way, can we fit a Gaussian curve to our histogram? On R, we

- first plot the histogram using the freq=FALSE option -- the vertical scale will be DENSITY rather than COUNTS
- use the curve command to add a normal curve where the mean and standard deviation are found by the mean and sd of the ages

Here are the commands and the resulting graph.

hist(age,breaks=cutpoints,freq=FALSE)

curve(dnorm(x,mean(age),sd(age)),add=TRUE,col="red",lwd=3)

It should be clear that a Gaussian curve is not a good model for baseball ages.

## 40 comments:

「不可能」這個字詞，在聰明人的字典中是找不到的。........................................

感謝是愛心的第一步........................................

IT IS A VERY NICE SUGGESTION, THANK YOU LOTS! ........................................

I do like ur article~!!!..................................................

怠惰使一切事情都困難，勤勞使一切事情都容易 ..................................................

Virtue is a jewel of great price.......................................................

blog is great~~祝你人氣高高~ .........................................

nice job! waiting for your new artical.........................................

Extremes meet. 長處即短處。物極必反。否極泰來！一切會更好！ ....................................................

After a storm comes a calm.

青春一逝不復返，事業一失難有成。........................................

let us be happey everyday!!........................................

色情網自拍影片色情文章比基尼成人動畫色瞇瞇影片網小弟貼影片bt成人成人 影片日本成人網站日本成人網站破解日本成人網址日本成人線上免費日本成人免費影片日本成人動畫日本曾根日本有碼 dvd 專賣店日本有碼進口dvd專賣店日本東洋影片視訊 辣妹g8成人下載av短片-免費a片亞亞 dvd 光碟嘿咻kiss168cu成人bt情色 網4u成人0401影音視訊交友愛情館本土自拍xd成人圖區新人淚成人色網kkg亞洲免費影片av影片欣賞性行為補給站999成人性站最愛78論壇最色情的網站最色情的遊戲最多人聊天室最大a片網

Venture a small fish to catch a great one. ............................................................

君子如水，隨方就圓，無處不自在。 ..................................................

你不能左右天氣，但你可以改變心情..................................................

出遊不拘名勝，有景就是好的..............................................................

人因夢想而偉大，要堅持自己的理想哦......................................................................

河水永遠是相同的，可是每一剎那又都是新的。......................................................................

死亡是悲哀的，但活得不快樂更悲哀。......................................................................

成熟，就是有能力適應生活中的模糊。.................................................................

向著星球長驅直進的人，反比踟躕在峽路上的人，更容易達到目的。............................................................

噴泉的高度，不會超過它的源頭。一個人的事業也是如此，它的成就絕不會超過自己的信念。.................................................................

嗨!很喜歡來這欣賞你的作品,幫你推推推當上人氣王唷..................................................................

好的blog需要我們一起努力！............................................................

Many a little makes a mickle.......................................................................

要經常發表文章 最愛你了呦............................................................

Practice makes perfect...................................................

要用心經營哦~~祝福你~~

..................................................................

很棒的分享~~~來留個言囉~~~~.......................................................

祝你人氣百分百 請繼續幫我們加油打氣..................................................

我的痛苦會停止，但求我的心能征服它。................................................

You may be only one person in the world, but you may also be the world to one person.............................................................

Learn wisdom by the follies of others.............................................................

人不能像動物一樣活著，而應該追求知識和美德............................................................

愛情是一種發明，需要不斷改良。只是，這種發明和其他發明不一樣，它沒有專利權，隨時會被人搶走。.................................................................

如果成為一支火柴，也要點亮一個短暫的宇宙；如果是一隻烏鴉，也要叫疼閉塞的耳膜。............................. ...................................．

君子立恆志，小人恆立志。................. ................................................

到處盡心,即為快事；舉步踏實,便是坦途。................. ................................................

培養健全孩子最好的方法是父母先成為健全的人。......................................................................

Post a Comment