Wednesday, March 25, 2009

Looking at Hits on my Web Page

The next topic is two-way tables. To illustrate the application of median polish, I took some convenient data, that is, a data that was readily available to me.

A couple of years ago, I wrote a book on Bayesian computing using R. I have a website that gives resource material for the book and I use Google Analytics to monitor hits on this particular website. Each day I observe the number and location of hits; it is interesting data partly since it seems that statisticians from many countries are interested in my book.

Anyway, here is the data -- a number in the table represents the number of hits for a particular day of the week for a particular week.

Week 1 Week 2 Week 3 Week 4
Sunday 22 12 17 15
Monday 23 15 27 17
Tuesday 17 26 21 14
Wednesday 26 13 18 18
Thursday 24 27 28 13
Friday 28 17 17 19
Saturday 14 11 13 13

I am interested in how the website hits vary across days of the week and also how the website hits vary across weeks. I can explore these patterns by means of an additive fit that I do by the median polish algorithm.

Since the data is stored as a matrix, a median polish is done by the medpolish function:

fit=medpolish(web.hits)

The variable "fit" stores the output of medpolish. Let's look at each component of medpolish.

> fit$overall
[1] 18

This tells that the average number of hits (per day) on my website was 18.

> fit$row
Sunday Monday Tuesday Wednesday Thursday Friday
-2.00 0.50 -0.50 0.00 5.50 1.75
Saturday
-5.25

These are the row effects. For Sunday, the row effect is -2 -- this means that on this day, the number of hits tends to be 2 smaller than average. Comparing Sunday and Monday, there tends to be 0.50 - (-2.00) = 2.5 more hits on Monday.

> fit$col
Week 1 Week 2 Week 3 Week 4
4.50 -2.75 1.00 -1.00

These are the column effects. It looks like my website hits across weeks where HIGH, LOW, high, low. On average, there were 4.50 - (2.75) = 7.25 more hits on Week 1 than Week 2.

The remaining component in the additive fit are the residuals. These tell us how the hit values deviate from the fitted values (from the additive model).

> fit$residuals
Week 1 Week 2 Week 3 Week 4
Sunday 1.50 -1.25 0.00 0.00
Monday 0.00 -0.75 7.50 -0.50
Tuesday -5.00 11.25 2.50 -2.50
Wednesday 3.50 -2.25 -1.00 1.00
Thursday -4.00 6.25 3.50 -9.50
Friday 3.75 0.00 -3.75 0.25
Saturday -3.25 1.00 -0.75 1.25

If the residual values are generally small (small compared to the row and column effects), then the additive model is a good description of the patterns in the data. Actually, the residuals look large to me, so I'm not sure I'd get that excited about this additive fit. Specifically, the residual for Tuesday, Week 2 is 11.25 -- for some reason, this particular day had many hits -- many more than one would expect based on its day of the week and week number.

1 comment:

Anonymous said...

nice intro. But is there a way to export all these (results from medpolish) values to csv/tsv/txt file ?