Thankyou very much Marc for that nifty little script. When I use it on my real dataset though, the lines are fat in the middle and thinner towards the ends. I guess it's because "lines" draw one fitted line for each x, and if you have hundreds of x, this turns into a line that is thicker that it should be (due to rounding errors perhaps). I got the tip to use "segments", and draw one line from min(x),min(y) to max(x),max(y) but with real data with a bunch of "na.rm" and "na.action"s this becomes very long and bulky. For a regression with two data swarms and two trend lines it becomes this long mess: <>plot(TCgonad[Period=="1"], ABelly[Period=="1"],xlim=c(0,20),ylim=c(130,160), col=?blue?) points(TCgonad[Period=="2"], ABelly[Period=="2"],col=?red?) segments( min(TCgonad[Period=="1"],na.rm=T), min(fitted(lm(ABelly[Period=="1"]~TCgonad[Period=="1"], na.action=na.exclude)),na.rm=T), max(TCgonad[Period=="1"],na.rm=T), max(fitted(lm(ABelly[Period=="1"]~TCgonad[Period=="1"], na.action=na.exclude)),na.rm=T),col=?blue?) <>segments( min(TCgonad[Period=="2"],na.rm=T), min(fitted(lm(ABelly[Period=="2"]~TCgonad[Period=="2"], na.action=na.exclude)),na.rm=T), max(TCgonad[Period=="2"],na.rm=T), max(fitted(lm(ABelly[Period=="2"]~TCgonad[Period=="2"], na.action=na.exclude)),na.rm=T),col=?red?) I just think it's strange that abline has as a nonadjustable default to extrapolate the line to outside the data - a mortal sin in my field. Cheers Andreas
Marc Schwartz (via MN)
2006-May-24 20:32 UTC
[R] Regression line limited by the range of values
On Wed, 2006-05-24 at 21:53 +0200, Andreas Svensson wrote:> Thankyou very much Marc for that nifty little script. > > When I use it on my real dataset though, the lines are fat in the middle > and thinner towards the ends. I guess it's because "lines" draw one > fitted line for each x, and if you have hundreds of x, this turns into a > line that is thicker that it should be (due to rounding errors perhaps). > > I got the tip to use "segments", and draw one line from min(x),min(y) to > max(x),max(y) but with real data with a bunch of "na.rm" and > "na.action"s this becomes very long and bulky. > > For a regression with two data swarms and two trend lines it becomes > this long mess: > > <>plot(TCgonad[Period=="1"], > ABelly[Period=="1"],xlim=c(0,20),ylim=c(130,160), col=?blue?) > points(TCgonad[Period=="2"], ABelly[Period=="2"],col=?red?) > > segments( > min(TCgonad[Period=="1"],na.rm=T), > min(fitted(lm(ABelly[Period=="1"]~TCgonad[Period=="1"], > na.action=na.exclude)),na.rm=T), > max(TCgonad[Period=="1"],na.rm=T), > max(fitted(lm(ABelly[Period=="1"]~TCgonad[Period=="1"], > na.action=na.exclude)),na.rm=T),col=?blue?) > > <>segments( > min(TCgonad[Period=="2"],na.rm=T), > min(fitted(lm(ABelly[Period=="2"]~TCgonad[Period=="2"], > na.action=na.exclude)),na.rm=T), > max(TCgonad[Period=="2"],na.rm=T), > max(fitted(lm(ABelly[Period=="2"]~TCgonad[Period=="2"], > na.action=na.exclude)),na.rm=T),col=?red?) > > > I just think it's strange that abline has as a nonadjustable default to > extrapolate the line to outside the data - a mortal sin in my field. > > Cheers > AndreasAndreas, What is likely happening is that your x values are not sorted in increasing order as they were in the dummy data set, thus there is probably some movement of the lines back and forth from one x value to the next in the order that they appear in the data set. Here is a more generic approach that should work. It is based upon the examples in ?predict.lm. We create the two models and then use range(x?) for the new min,max x values to be used for the prediction of the fitted y values. Thus: # Create model1 using random data set.seed(1) x1 <- rnorm(15) y1 <- x1 + rnorm(15) model1 <- lm(y1 ~ x1) # Create model2 using random data set.seed(2) x2 <- rnorm(15) y2 <- x2 + rnorm(15) model2 <- lm(y2 ~ x2) # Plot the first set of points # set the axis ranges based upon the common ranges of the # two sets of data plot(x1, y1, col = "red", xlim = range(x1, x2), ylim = range(y1, y2)) # Add the second set of points points(x2, y2, col = "blue") # Create the first fitted line # Create two x values for the prediction based upon the # range of x1 lines(range(x1), predict(model1, data.frame(x1 = range(x1))), col = "red") # Create the second fitted line # Same here for the x2 values lines(range(x2), predict(model2, data.frame(x2 = range(x2))), col = "blue") Note that in both cases for the fitted lines, the lines will be constrained by the range of the actual x? data values. This should be easier to apply in general. HTH, Marc Schwartz