Maybe a useful addition to the predict functions would be to return the values of the predictor variables. It just (unless there are problems) requires an extra line. I have inserted an example below. "predict.glm" <- function (object, newdata = NULL, type = c("link", "response", "terms"), se.fit = FALSE, dispersion = NULL, terms = NULL, na.action = na.pass, ...) { type <- match.arg(type) na.act <- object$na.action object$na.action <- NULL if (!se.fit) { if (missing(newdata)) { pred <- switch(type, link = object$linear.predictors, response = object$fitted, terms = predict.lm(object, se.fit = se.fit, scale = 1, type = "terms", terms = terms)) if (!is.null(na.act)) pred <- napredict(na.act, pred) } else { pred <- predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == "link", "response", type), terms = terms, na.action = na.action) switch(type, response = { pred <- family(object)$linkinv(pred) }, link = , terms = ) } } else { if (inherits(object, "survreg")) dispersion <- 1 if (is.null(dispersion) || dispersion == 0) dispersion <- summary(object, dispersion = dispersion)$dispersion residual.scale <- as.vector(sqrt(dispersion)) pred <- predict.lm(object, newdata, se.fit, scale = residual.scale, type = ifelse(type == "link", "response", type), terms = terms, na.action = na.action) fit <- pred$fit se.fit <- pred$se.fit switch(type, response = { se.fit <- se.fit * abs(family(object)$mu.eta(fit)) fit <- family(object)$linkinv(fit) }, link = , terms = ) if (missing(newdata) && !is.null(na.act)) { fit <- napredict(na.act, fit) se.fit <- napredict(na.act, se.fit) } predictors <- if (missing(newdata)) model.matrix(object) else newdata pred <- list(predictors=predictors, fit = fit, se.fit = se.fit, residual.scale = residual.scale) } pred #______________________ end of R code Ross Darnell -- School of Health and Rehabilitation Sciences University of Queensland, Brisbane QLD 4072 AUSTRALIA Email: <r.darnell at uq.edu.au> Phone: +61 7 3365 6087 Fax: +61 7 3365 4754 Room:822, Therapies Bldg. http://www.shrs.uq.edu.au/shrs/school_staff/ross_darnell.html
I must respectfully disagree. Why carry extra copies of data arround? This is probably OK for small to medium sized data, but definitely not for large data. Besides, in your example, it may do different things depending on whether newdata is supplied: model.matrix is not necessarily the same as the original data frame. You need a bit more work to get the right model.matrix that correspond to the newdata. It's not clear to me whether you want to return model matrix or model frame, but in either case it's not sufficient to just use `newdata'. Andy> From: Ross Darnell > > Maybe a useful addition to the predict functions would be to > return the > values of the predictor variables. It just (unless there are > problems) > requires an extra line. I have inserted an example below. > > "predict.glm" <- > function (object, newdata = NULL, type = c("link", "response", > "terms"), se.fit = FALSE, > dispersion = NULL, terms = NULL, > na.action = na.pass, ...) > { > type <- match.arg(type) > na.act <- object$na.action > object$na.action <- NULL > if (!se.fit) { > if (missing(newdata)) { > pred <- switch(type, link = object$linear.predictors, > response = object$fitted, terms = > predict.lm(object, > se.fit = > se.fit, scale > = 1, type = "terms", > terms = terms)) > if (!is.null(na.act)) > pred <- napredict(na.act, pred) > } > else { > pred <- predict.lm(object, newdata, se.fit, scale = 1, > type = ifelse(type == "link", > "response", type), > terms = terms, na.action = na.action) > switch(type, response = { > pred <- family(object)$linkinv(pred) > }, link = , terms = ) > } > } > else { > if (inherits(object, "survreg")) > dispersion <- 1 > if (is.null(dispersion) || dispersion == 0) > dispersion <- summary(object, dispersion = > dispersion)$dispersion > residual.scale <- as.vector(sqrt(dispersion)) > pred <- predict.lm(object, newdata, se.fit, scale = > residual.scale, > type = ifelse(type == "link", > "response", type), > terms = terms, na.action = na.action) > fit <- pred$fit > se.fit <- pred$se.fit > switch(type, response = { > se.fit <- se.fit * abs(family(object)$mu.eta(fit)) > fit <- family(object)$linkinv(fit) > }, link = , terms = ) > if (missing(newdata) && !is.null(na.act)) { > fit <- napredict(na.act, fit) > se.fit <- napredict(na.act, se.fit) > } > predictors <- if (missing(newdata)) model.matrix(object) > else newdata > pred <- list(predictors=predictors, > fit = fit, se.fit = se.fit, > residual.scale = residual.scale) > } > pred > > > #______________________ end of R code > > > > Ross Darnell > -- > School of Health and Rehabilitation Sciences > University of Queensland, Brisbane QLD 4072 AUSTRALIA > Email: <r.darnell at uq.edu.au> > Phone: +61 7 3365 6087 Fax: +61 7 3365 4754 Room:822, > Therapies Bldg. > http://www.shrs.uq.edu.au/shrs/school_staff/ross_darnell.html > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > >
> From: Ross Darnell > > Liaw, Andy wrote: > > I must respectfully disagree. Why carry extra copies of > data arround? This > > is probably OK for small to medium sized data, but > definitely not for large > > data. > > > > Besides, in your example, it may do different things > depending on whether > > newdata is supplied: model.matrix is not necessarily the > same as the > > original data frame. You need a bit more work to get the > right model.matrix > > that correspond to the newdata. It's not clear to me > whether you want to > > return model matrix or model frame, but in either case it's > not sufficient > > to just use `newdata'. > > > > Andy > > > > > >>From: Ross Darnell > >> > >>Maybe a useful addition to the predict functions would be to > >>return the > >>values of the predictor variables. It just (unless there are > >>problems) > >>requires an extra line. I have inserted an example below. > >> > >>"predict.glm" <- > >> function (object, newdata = NULL, type = c("link", "response",[snip]> >> > >>Ross Darnell > >>-- > >>School of Health and Rehabilitation Sciences > >>University of Queensland, Brisbane QLD 4072 AUSTRALIA > >>Email: <r.darnell at uq.edu.au> > >>Phone: +61 7 3365 6087 Fax: +61 7 3365 4754 Room:822, > >>Therapies Bldg. > >>http://www.shrs.uq.edu.au/shrs/school_staff/ross_darnell.html > > A good point but what is the value of storing a large set of > predicted > values when the values of the explanatory variables are lost > (predicted > values of what?). I thought the purpose of objects was that they were > self explanatory (pardon the pun). > > Maybe we could make it optional.If what you are looking for is a way to track the observations, I'd suggest simply adding rownames of newdata as names of the predicted values. Storing names is much cheaper than the entire data frame of predictors. (And in R, data frames _must_ have unique row names.) Cheers, Andy> Ross Darnell > -- > Email: <r.darnell at uq.edu.au> > > >
> From: Liaw, Andy > > > From: Ross Darnell > > > > A good point but what is the value of storing a large set of > > predicted > > values when the values of the explanatory variables are lost > > (predicted > > values of what?). I thought the purpose of objects was that > they were > > self explanatory (pardon the pun). > > > > Maybe we could make it optional. > > If what you are looking for is a way to track the > observations, I'd suggest > simply adding rownames of newdata as names of the predicted > values. Storing > names is much cheaper than the entire data frame of > predictors. (And in R, > data frames _must_ have unique row names.)And as a matter of fact, predict.lm() and predict.glm() (and probably most other predict() methods) already do that. Andy> > Cheers, > Andy > > > Ross Darnell > > -- > > Email: <r.darnell at uq.edu.au> > > > > > > > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > > -------------------------------------------------------------- > ---------------- > Notice: This e-mail message, together with any attachments, > contains information of Merck & Co., Inc. (One Merck Drive, > Whitehouse Station, New Jersey, USA 08889), and/or its > affiliates (which may be known outside the United States as > Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as > Banyu) that may be confidential, proprietary copyrighted > and/or legally privileged. It is intended solely for the use > of the individual or entity named on this message. If you > are not the intended recipient, and have received this > message in error, please notify us immediately by reply > e-mail and then delete it from your system. > -------------------------------------------------------------- > ---------------- > >