Hi all
I have run into a case where I don't understand why predict.lrm and
predict.glm don't yield the same results. My data look like this:
set.seed(1)
library(Design); ilogit <- function(x) { 1/(1+exp(-x)) }
ORDER <- factor(sample(c("mc-sc", "sc-mc"), 403, TRUE))
CONJ <- factor(sample(c("als", "bevor",
"nachdem", "weil"), 403, TRUE))
LENGTH_DIFF <- sample(-32:25, 403, TRUE)
I fit two models:
model.01.lrm <- lrm(ORDER ~ CONJ*LENGTH_DIFF)
model.01.glm <- glm(ORDER ~ CONJ*LENGTH_DIFF, family=binomial)
Then I wanted to plot for each level of CONJ the predicted
probabilities of ORDER="sc-mc" across a large range of LENGTH_DIFF
values. Thus I use this (yes, I know about type="response" etc.):
# range of LENGTH_DIFF for CONJ="als"
x1 <- factor(rep("als", 201)); x2 <- seq(-100, 100, 1)
y.als.lrm <- ilogit(predict(model.01.lrm, newdata=list(CONJ=x1,
LENGTH_DIFF=x2)))
y.als.glm <- ilogit(predict(model.01.glm, newdata=list(CONJ=x1,
LENGTH_DIFF=x2)))
This works perfectly and the two vectors y.als.lrm and y.als.glm are
the same. But then:
# range of LENGTH_DIFF for CONJ="bevor"
x1 <- factor(rep("bevor", 201)); x2 <- seq(-100, 100, 1)
y.bevor.lrm <- ilogit(predict(model.01.lrm, newdata=list(CONJ=x1,
LENGTH_DIFF=x2)))
y.bevor.glm <- ilogit(predict(model.01.glm, newdata=list(CONJ=x1,
LENGTH_DIFF=x2)))
Here, y.bevor.lrm is the same as y.als.lrm, but not the same as y.bevor.glm
# range of LENGTH_DIFF for CONJ="nachdem"
x1 <- factor(rep("nachdem", 201)); x2 <- seq(-100, 100, 1)
y.nachdem.lrm <- ilogit(predict(model.01.lrm, newdata=list(CONJ=x1,
LENGTH_DIFF=x2)))
y.nachdem.glm <- ilogit(predict(model.01.glm, newdata=list(CONJ=x1,
LENGTH_DIFF=x2)))
Here, y.nachdem.lrm is the same as y.als.lrm, but not the same as y.nachdem.glm
# range of LENGTH_DIFF for CONJ="weil"
x1 <- factor(rep("weil", 201)); x2 <- seq(-100, 100, 1)
y.weil.lrm <- ilogit(predict(model.01.lrm, newdata=list(CONJ=x1,
LENGTH_DIFF=x2)))
y.weil.glm <- ilogit(predict(model.01.glm, newdata=list(CONJ=x1,
LENGTH_DIFF=x2)))
Here, y.weil.lrm is the same as y.als.lrm, but not the same as y.weil.glm
As a result, in this plot, the left panel is useless, but the right
has the four different curves:
par(mfrow=c(1,2))
plot(x2, y.als.lrm, pch="a", xlim=c(-100, 100), ylim=c(0, 1),
main="with predict.lrm", xlab="Main cl. length - subord. cl.
length
(in words)", ylab="Predicted probability of 'sc-mc'");
grid()
points(x2, y.bevor.lrm, pch="b")
points(x2, y.nachdem.lrm, pch="n")
points(x2, y.weil.lrm, pch="w")
plot(x2, y.als.glm, pch="a", xlim=c(-100, 100), ylim=c(0, 1),
main="with predict.glm", xlab="Main cl. length - subord. cl.
length
(in words)", ylab="Predicted probability of 'sc-mc'");
grid()
points(x2, y.bevor.glm, pch="b")
points(x2, y.nachdem.glm, pch="n")
points(x2, y.weil.glm, pch="w")
par(mfrow=c(1,1))
What am I doing wrong with predict.lrm?
Thanks in advance for any input you may have,
STG
--
Stefan Th. Gries
-----------------------------------------------
University of California, Santa Barbara
http://www.linguistics.ucsb.edu/faculty/stgries
Please follow the posting guide. Please provide the smallest possible self-contained example and note that conveys the problem. Note that the rms package has now replaced the Design package. Details at http://biostat.mc.vanderbilt.edu/Rrms Frank ----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/predict-lrm-vs-predict-glm-with-newdata-tp3163671p3164020.html Sent from the R help mailing list archive at Nabble.com.
Hi all
Trying again with this question. (If it's really that stupid a
question, I'd be happy to just receive a link or one or two words to
Google that I haven't thought of before):
I have run into a case where I don't understand why predict.lrm and
predict.glm don't yield the same results.
# My data look like this:
set.seed(1)
library(Design); ilogit <- function(x) { 1/(1+exp(-x)) }
ORDER <- factor(sample(c("mc-sc", "sc-mc"), 403, TRUE))
CONJ <- factor(sample(c("als", "bevor",
"nachdem", "weil"), 403, TRUE))
LENGTH_DIFF <- sample(-32:25, 403, TRUE)
# I fit two models:
model.01.lrm <- lrm(ORDER ~ CONJ*LENGTH_DIFF)
model.01.glm <- glm(ORDER ~ CONJ*LENGTH_DIFF, family=binomial)
# Then I wanted to plot for each level of CONJ the predicted
probabilities of ORDER="sc-mc" across a large range of LENGTH_DIFF
values. Thus I use this (yes, I know about type="response" etc.):
# range of LENGTH_DIFF for CONJ="als"
x1 <- factor(rep("als", 201)); x2 <- seq(-100, 100, 1)
y.als.lrm <- ilogit(predict(model.01.lrm, newdata=list(CONJ=x1,
LENGTH_DIFF=x2)))
y.als.glm <- ilogit(predict(model.01.glm, newdata=list(CONJ=x1,
LENGTH_DIFF=x2)))
# This works perfectly and the two vectors y.als.lrm and y.als.glm are
the same. But then:
# range of LENGTH_DIFF for CONJ="bevor"
x1 <- factor(rep("bevor", 201)); x2 <- seq(-100, 100, 1)
y.bevor.lrm <- ilogit(predict(model.01.lrm, newdata=list(CONJ=x1,
LENGTH_DIFF=x2)))
y.bevor.glm <- ilogit(predict(model.01.glm, newdata=list(CONJ=x1,
LENGTH_DIFF=x2)))
# Here, y.bevor.lrm is the same as y.als.lrm, but *not* the same as y.bevor.glm
# range of LENGTH_DIFF for CONJ="nachdem"
x1 <- factor(rep("nachdem", 201)); x2 <- seq(-100, 100, 1)
y.nachdem.lrm <- ilogit(predict(model.01.lrm, newdata=list(CONJ=x1,
LENGTH_DIFF=x2)))
y.nachdem.glm <- ilogit(predict(model.01.glm, newdata=list(CONJ=x1,
LENGTH_DIFF=x2)))
# Here, y.nachdem.lrm is the same as y.als.lrm, but *not* the same as
y.nachdem.glm
# range of LENGTH_DIFF for CONJ="weil"
x1 <- factor(rep("weil", 201)); x2 <- seq(-100, 100, 1)
y.weil.lrm <- ilogit(predict(model.01.lrm, newdata=list(CONJ=x1,
LENGTH_DIFF=x2)))
y.weil.glm <- ilogit(predict(model.01.glm, newdata=list(CONJ=x1,
LENGTH_DIFF=x2)))
# Here, y.weil.lrm is the same as y.als.lrm, but *not* the same as y.weil.glm
#As a result, in this plot, the left panel is useless, but the right
has the four different curves:
par(mfrow=c(1,2))
plot(x2, y.als.lrm, pch="a", xlim=c(-100, 100), ylim=c(0, 1),
main="with predict.lrm", xlab="Main cl. length - subord. cl.
length
(in words)", ylab="Predicted probability of 'sc-mc'");
grid()
points(x2, y.bevor.lrm, pch="b")
points(x2, y.nachdem.lrm, pch="n")
points(x2, y.weil.lrm, pch="w")
plot(x2, y.als.glm, pch="a", xlim=c(-100, 100), ylim=c(0, 1),
main="with predict.glm", xlab="Main cl. length - subord. cl.
length
(in words)", ylab="Predicted probability of 'sc-mc'");
grid()
points(x2, y.bevor.glm, pch="b")
points(x2, y.nachdem.glm, pch="n")
points(x2, y.weil.glm, pch="w")
par(mfrow=c(1,1))
What am I doing wrong with predict.lrm?
Thanks in advance for any input you may have,
STG
--
Stefan Th. Gries
-----------------------------------------------
University of California, Santa Barbara
http://www.linguistics.ucsb.edu/faculty/stgries