First, most of you did well on this homework. You were good in describing the fit and the pattern in the residuals. Also you made reasonable choices at reexpressing the x and/or the y variable so that the graph looked pretty straight.
But there were some things that caused you to lose points.
1. What are you looking for in a residual plot? In this assignment, the focus was looking for nonlinear patterns. For example, is it appropriate to fit a line to the (year, population) data for the England and Wales dataset? We can answer this question by looking for a nonlinear pattern in the plot of residuals against year. There is significant curvature (that is, a quadratic pattern) in the residual plot which tells you that the population growth is not linear.
By the way, some of you plotted log population against log year -- why did you take the log of year? It doesn't make any sense to me.
2. Always talk about the fit and the residuals in the context of the data. Someone plotted x against y without telling me the variables. The fun part of statistics is that you can always talk about the application.
3. Remember that funny problem where the scatterplot shows two group of points with a clear separation? This is one type of nonlinear pattern that you won't be able to straighten with a single choice of power transformation. But since the graph clearly divides into two parts, it makes sense to treat this as two independent problems and try to straighten each part.
4. Should one fit a line by least-squares or by a resistant line? In many situations, it won't make a difference -- either fit will work. But least-squares can give you relatively poor fits when there are outliers.
How can you tell if least squares isn't the best fit? Look at the residual plot. If you still see some increasing or descreasing pattern, then this tells you that least-squares hasn't explained all of the "tilt" pattern in the graph.
No comments:
Post a Comment