thr3ads.net - R help - [R] Regression lines for differently-sized groups on the same plot [Jul 2005]

If this information is useful, please help other people find it:
Share via:

Laura M Marx

2005-Jul-19 23:53 UTC

[R] Regression lines for differently-sized groups on the same plot

Hi there,
  I've looked through the very helpful advice about adding fitted lines to 
plots in the r-help archive, and can't find a post where someone has offered
a solution for my specific problem.  I need to plot logistic regression fits 
from three differently-sized data subsets on a plot of the entire dataset.  
A description and code are below:
  I have an unbalanced dataset consisting of three different species (hem, 
yb, and sm), with unequal numbers of wood pieces in each species group.  I 
am trying to generate a plot that will show the size of the wood piece on 
the X axis, the probability of it having tree seedlings growing on it on the 
Y (a binomial yes or no variable), and three fitted curves showing how the 
probability of having tree seedlings changes with increasing wood piece size 
for each species.
  I have no problem generating fits using GLM, and no problem creating the 
plot.  However, if I try to add a fitted curve based only on the hem data 
subset to a plot that shows the entire dataset, I get an error message that 
the lengths of those data sets differ. "Error in xy.coords(x,y) : x and y 
lengths differ".  I could see R's point -- you can't plot a
regression line
of babies born as a function of stork abundance on a graph of cherries 
produced (Y) versus rainfall (X), which for all the program knows, I'm 
trying to do.  As a temporary fix, I added NAs to the end of the hem, yb, 
and sm subsets to make them the same length as the entire dataset.  I can 
now add my fitted curves to the plot, but the lines are not connected.  That 
is, if the hem group only contains wood pieces that are 1, 4, and 10 meters 
long, the plot has an X axis that ranges from 1 to 10, but line segments for 
the hem group regression line only appear above 1, 4, and 10.  How can I fix 
this?  An ideal solution would not require me to make the hem subset of my 
data the same length as the full dataset, either (although the summaries of 
regressions with the NAs (or zeroes) added and taken away are identical).  
I'd also settle for a work-around that would have R connect the pieces of 
the curve so that I get a solid line rather than small dots and dashes where 
actual data exist.  Thanks so much for your help!
  Laura Marx
  Michigan State University, Dept. of Forestry 

#Note: hemdata has all the rows that are not hemlock species replaced with 
#"NA"s.
hemhem=glm(hempresence~logarea, family=binomial(logit), data=hemdata)
hemyb=glm(hempresence~logarea, family=binomial(logit), data=birchdata)
hemsm=glm(hempresence~logarea, family=binomial(logit), data=mapledata) 

attach(logreg) #logreg is the full dataset
plot(logarea, hempresence, xlab = "Surface area of log (m2)", 
ylab="Probability of hemlock seedling presence", type="n",
font.lab=2,
cex.lab=1.5, axes=TRUE)
lines(logarea,fitted(hemhem), lty=1, lwd=2)
lines(logarea,fitted(hemyb), lty="dashed", lwd=2)
lines(logarea,fitted(hemsm), lty="dotted", lwd=2)

Sundar Dorai-Raj

2005-Jul-20 02:56 UTC

head link

[R] Regression lines for differently-sized groups on the same plot

Laura M Marx wrote:> Hi there,
>   I've looked through the very helpful advice about adding fitted lines
to
> plots in the r-help archive, and can't find a post where someone has
offered
> a solution for my specific problem.  I need to plot logistic regression
fits
> from three differently-sized data subsets on a plot of the entire dataset.
> A description and code are below:
>   I have an unbalanced dataset consisting of three different species (hem, 
> yb, and sm), with unequal numbers of wood pieces in each species group.  I 
> am trying to generate a plot that will show the size of the wood piece on 
> the X axis, the probability of it having tree seedlings growing on it on
the
> Y (a binomial yes or no variable), and three fitted curves showing how the 
> probability of having tree seedlings changes with increasing wood piece
size
> for each species.
>   I have no problem generating fits using GLM, and no problem creating the 
> plot.  However, if I try to add a fitted curve based only on the hem data 
> subset to a plot that shows the entire dataset, I get an error message that
> the lengths of those data sets differ. "Error in xy.coords(x,y) : x
and y
> lengths differ".  I could see R's point -- you can't plot a
regression line
> of babies born as a function of stork abundance on a graph of cherries 
> produced (Y) versus rainfall (X), which for all the program knows, I'm 
> trying to do.  As a temporary fix, I added NAs to the end of the hem, yb, 
> and sm subsets to make them the same length as the entire dataset.  I can 
> now add my fitted curves to the plot, but the lines are not connected. 
That
> is, if the hem group only contains wood pieces that are 1, 4, and 10 meters
> long, the plot has an X axis that ranges from 1 to 10, but line segments
for
> the hem group regression line only appear above 1, 4, and 10.  How can I
fix
> this?  An ideal solution would not require me to make the hem subset of my 
> data the same length as the full dataset, either (although the summaries of
> regressions with the NAs (or zeroes) added and taken away are identical).  
> I'd also settle for a work-around that would have R connect the pieces
of
> the curve so that I get a solid line rather than small dots and dashes
where
> actual data exist.  Thanks so much for your help!
>   Laura Marx
>   Michigan State University, Dept. of Forestry 
> 
> #Note: hemdata has all the rows that are not hemlock species replaced with 
> #"NA"s.
> hemhem=glm(hempresence~logarea, family=binomial(logit), data=hemdata)
> hemyb=glm(hempresence~logarea, family=binomial(logit), data=birchdata)
> hemsm=glm(hempresence~logarea, family=binomial(logit), data=mapledata) 
> 
> attach(logreg) #logreg is the full dataset
> plot(logarea, hempresence, xlab = "Surface area of log (m2)", 
> ylab="Probability of hemlock seedling presence",
type="n", font.lab=2,
> cex.lab=1.5, axes=TRUE)
> lines(logarea,fitted(hemhem), lty=1, lwd=2)
> lines(logarea,fitted(hemyb), lty="dashed", lwd=2)
> lines(logarea,fitted(hemsm), lty="dotted", lwd=2)
> 
Hi, Laura,

Would ?predict.glm be better?

plot(logarea, hempresence,
      xlab = "Surface area of log (m2)",
      ylab="Probability of hemlock seedling presence",
      type="n", font.lab=2, cex.lab=1.5, axes=TRUE)
lines(logarea, predict(hemhem, logreg, "response"), lty=1, lwd=2)
lines(logarea, predict(hemyb, logreg, "response"),
lty="dashed", lwd=2)
lines(logarea, predict(hemsm, logreg, "response"),
lty="dotted", lwd=2)

Without seeing more description of your data, this is still a guess.

--sundar

Laura M Marx

2005-Jul-20 17:58 UTC

head link

[R] Regression lines for differently-sized groups on the same plot

Sundar Dorai-Raj writes: 
> Hi, Laura, 
> 
> Would ?predict.glm be better? 
> 
> plot(logarea, hempresence,
>      xlab = "Surface area of log (m2)",
>      ylab="Probability of hemlock seedling presence",
>      type="n", font.lab=2, cex.lab=1.5, axes=TRUE)
> lines(logarea, predict(hemhem, logreg, "response"), lty=1, lwd=2)
> lines(logarea, predict(hemyb, logreg, "response"),
lty="dashed", lwd=2)
> lines(logarea, predict(hemsm, logreg, "response"),
lty="dotted", lwd=2)
> 
> Without seeing more description of your data, this is still a guess. 
> 
> --sundar 
> 
YES!  Thank you.  That solves all of the problems I was having.  Thanks also 
to Tom Mulholland and Bill Venables for their replies.
     Laura
The final code is now: 

logreg <- read.table("C:/Documents and
Settings/Laura/Desktop/logreg.txt",
header=TRUE, sep=",", na.strings="NA", dec=".",
strip.white=TRUE)
#This is the full dataset.  A hemlock row followed by a birch row might look 
#like:
#Hemybsm     logarea    hempresence
#1           0.054      0
#2           1.370      1 

hemhem=glm(hempresence~logarea, family=binomial(logit), subset=Hemybsm<2)
#This pulls out only the hemlocks from the full dataset and calculates the 
#regression.
hemyb=glm(hempresence~logarea, family=binomial(logit), subset=Hemybsm==2)
#Same code, with only the birches from the full dataset.
hemsm=glm(hempresence~logarea, family=binomial(logit), subset=Hemybsm>2) 

plot(logarea, hempresence, xlab = "Surface area of log (m2)",
    ylab="Probability of hemlock seedling presence",
    type="n", font.lab=2, cex.lab=1.5, axes=TRUE)
lines(logarea, predict(hemhem, logreg, "response"), lty=1, lwd=2)
lines(logarea, predict(hemyb, logreg, "response"),
lty="dashed", lwd=2)
lines(logarea, predict(hemsm, logreg, "response"),
lty="dotted", lwd=2)

Reasonably Related Threads

Search for more reasonably related threads

R help - Jul 2005 - Regression lines for differently-sized groups on the same plot

[R] Regression lines for differently-sized groups on the same plot

[R] Regression lines for differently-sized groups on the same plot

[R] Regression lines for differently-sized groups on the same plot

Reasonably Related Threads