thr3ads.net - R help - [R] Regression line limited by the range of values [May 2006]

If this information is useful, please help other people find it:
Share via:

Andreas Svensson

2006-May-24 19:53 UTC

[R] Regression line limited by the range of values

Thankyou very much Marc for that nifty little script.

When I use it on my real dataset though, the lines are fat in the middle 
and thinner towards the ends. I guess it's because "lines" draw
one
fitted line for each x, and if you have hundreds of x, this turns into a 
line that is thicker that it should be (due to rounding errors perhaps).

I got the tip to use "segments", and draw one line from min(x),min(y)
to
max(x),max(y) but with real data with a bunch of "na.rm" and 
"na.action"s this becomes very long and bulky.

For a regression with two data swarms and two trend lines it becomes 
this long mess:

<>plot(TCgonad[Period=="1"], 
ABelly[Period=="1"],xlim=c(0,20),ylim=c(130,160), col=?blue?)
points(TCgonad[Period=="2"], ABelly[Period=="2"],col=?red?)

segments(
min(TCgonad[Period=="1"],na.rm=T), 
min(fitted(lm(ABelly[Period=="1"]~TCgonad[Period=="1"],
na.action=na.exclude)),na.rm=T),
max(TCgonad[Period=="1"],na.rm=T), 
max(fitted(lm(ABelly[Period=="1"]~TCgonad[Period=="1"],
na.action=na.exclude)),na.rm=T),col=?blue?)

<>segments(
min(TCgonad[Period=="2"],na.rm=T), 
min(fitted(lm(ABelly[Period=="2"]~TCgonad[Period=="2"],
na.action=na.exclude)),na.rm=T),
max(TCgonad[Period=="2"],na.rm=T), 
max(fitted(lm(ABelly[Period=="2"]~TCgonad[Period=="2"],
na.action=na.exclude)),na.rm=T),col=?red?)


I just think it's strange that abline has as a nonadjustable default to 
extrapolate the line to outside the data - a mortal sin in my field.

Cheers
Andreas

Marc Schwartz (via MN)

2006-May-24 20:32 UTC

head link

[R] Regression line limited by the range of values

On Wed, 2006-05-24 at 21:53 +0200, Andreas Svensson
wrote:> Thankyou very much Marc for that nifty little script.
> 
> When I use it on my real dataset though, the lines are fat in the middle 
> and thinner towards the ends. I guess it's because "lines"
draw one
> fitted line for each x, and if you have hundreds of x, this turns into a 
> line that is thicker that it should be (due to rounding errors perhaps).
> 
> I got the tip to use "segments", and draw one line from
min(x),min(y) to
> max(x),max(y) but with real data with a bunch of "na.rm" and 
> "na.action"s this becomes very long and bulky.
> 
> For a regression with two data swarms and two trend lines it becomes 
> this long mess:
> 
> <>plot(TCgonad[Period=="1"], 
> ABelly[Period=="1"],xlim=c(0,20),ylim=c(130,160), col=?blue?)
> points(TCgonad[Period=="2"],
ABelly[Period=="2"],col=?red?)
> 
> segments(
> min(TCgonad[Period=="1"],na.rm=T), 
> min(fitted(lm(ABelly[Period=="1"]~TCgonad[Period=="1"],
> na.action=na.exclude)),na.rm=T),
> max(TCgonad[Period=="1"],na.rm=T), 
> max(fitted(lm(ABelly[Period=="1"]~TCgonad[Period=="1"],
> na.action=na.exclude)),na.rm=T),col=?blue?)
> 
> <>segments(
> min(TCgonad[Period=="2"],na.rm=T), 
> min(fitted(lm(ABelly[Period=="2"]~TCgonad[Period=="2"],
> na.action=na.exclude)),na.rm=T),
> max(TCgonad[Period=="2"],na.rm=T), 
> max(fitted(lm(ABelly[Period=="2"]~TCgonad[Period=="2"],
> na.action=na.exclude)),na.rm=T),col=?red?)
> 
> 
> I just think it's strange that abline has as a nonadjustable default to
> extrapolate the line to outside the data - a mortal sin in my field.
> 
> Cheers
> Andreas

Andreas,

What is likely happening is that your x values are not sorted in
increasing order as they were in the dummy data set, thus there is
probably some movement of the lines back and forth from one x value to
the next in the order that they appear in the data set.

Here is a more generic approach that should work. It is based upon the
examples in ?predict.lm. We create the two models and then use range(x?)
for the new min,max x values to be used for the prediction of the fitted
y values.

Thus:

# Create model1 using random data
set.seed(1)
x1 <- rnorm(15)
y1 <- x1 + rnorm(15)
model1 <- lm(y1 ~ x1)

# Create model2 using random data
set.seed(2)
x2 <- rnorm(15)
y2 <- x2 + rnorm(15)
model2 <- lm(y2 ~ x2)


# Plot the first set of points
# set the axis ranges based upon the common ranges of the 
# two sets of data
plot(x1, y1, col = "red", xlim = range(x1, x2),
     ylim = range(y1, y2))

# Add the second set of points
points(x2, y2, col = "blue")

# Create the first fitted line
# Create two x values for the prediction based upon the 
# range of x1
lines(range(x1), predict(model1, data.frame(x1 = range(x1))), 
      col = "red")

# Create the second fitted line
# Same here for the x2 values
lines(range(x2), predict(model2, data.frame(x2 = range(x2))), 
      col = "blue")


Note that in both cases for the fitted lines, the lines will be
constrained by the range of the actual x? data values.

This should be easier to apply in general.

HTH,

Marc Schwartz

Apparently Analagous Threads

Search for more possibly parallel threads

R help - May 2006 - Regression line limited by the range of values

[R] Regression line limited by the range of values

[R] Regression line limited by the range of values

Apparently Analagous Threads