Hi, I would like to tabulate the likelihood for an affection. For this, I retrieve indices of affected people and controls for my data set and proceed as follows: flags <- c(rep(1, length(patient_indices)), rep(0, length(control_indices))) # dataset is a data.frame and param the parameter to be analysed: data1 <- dataset[,param][c(patient_indices, control_indices)] fit1 <- glm(flags ~ data1, family = binomial) new.data <- seq(0, 300, 10) new.p <- predict(fit1, data.frame(newdata = new.data), type = "response") Which than gives data not in dependence of new.data and a warning which reads "Warning message: 'newdata' had 31 rows but variable(s) found have 306 rows" In a similar script new.p were data ranging from 1 to 31 with the cumulative likelihood associated with them. Now new.p looks a bit like random numbers assigned to a list ranging from 1 to 306. (306 is the number of datapoints in data1.) Unfortunately I am unable to spot the difference of the two scripts. I would appreciate any pointer on my mistake (and hope that my problem was understandable). TIA Christian
Hi: The data frame you submit as newdata = to predict() has to have the same variables as the right hand side of the model formula. For example, if the model has covariates x1, x2, x3, then the data frame you create as the newdata has to consist of columns named x1, x2, x3. Another problem is that you want to combine all the variables into a data frame if you intend to use the predict() method, something like mdata <- data.frame(flags, data1) fit1 <- glm(flags ~ ., data = mdata, family = binomial) The prediction data frame for newdata then has to have the same variable names as those in data1. HTH, Dennis On Mon, Jul 11, 2011 at 8:51 AM, Meesters, Christian <meesters at aesku.com> wrote:> Hi, > > I would like to tabulate the likelihood for an affection. For this, I retrieve indices of affected people and controls for my data set and proceed as follows: > > flags <- c(rep(1, length(patient_indices)), rep(0, length(control_indices))) > # dataset is a data.frame and param the parameter to be analysed: > data1 ?<- dataset[,param][c(patient_indices, control_indices)] > fit1 <- glm(flags ~ data1, family = binomial) > new.data ? ?<- seq(0, 300, 10) > new.p ? <- predict(fit1, data.frame(newdata = new.data), type = "response") > > Which than gives data not in dependence of new.data and a warning which reads > "Warning message: > 'newdata' had 31 rows but variable(s) found have 306 rows" > > In a similar script new.p were data ranging from 1 to 31 with the cumulative likelihood associated with them. Now new.p looks a bit like random numbers assigned to a list ranging from 1 to 306. (306 is the number of datapoints in data1.) Unfortunately I am unable to spot the difference of the two scripts. > > I would appreciate any pointer on my mistake (and hope that my problem was understandable). > > TIA > Christian > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On Jul 11, 2011, at 11:51 AM, Meesters, Christian wrote:> Hi, > > I would like to tabulate the likelihood for an affection. For this, > I retrieve indices of affected people and controls for my data set > and proceed as follows: > > flags <- c(rep(1, length(patient_indices)), rep(0, > length(control_indices))) > # dataset is a data.frame and param the parameter to be analysed: > data1 <- dataset[,param][c(patient_indices, control_indices)] > fit1 <- glm(flags ~ data1, family = binomial) > new.data <- seq(0, 300, 10) > new.p <- predict(fit1, data.frame(newdata = new.data), type = > "response")Should (probably) have been ... names of RHS variables need to be exact match: new.p <- predict(fit1, newdata= data.frame(data1 = new.data), type = "response") (Obviously untested.)> > Which than gives data not in dependence of new.data and a warning > which reads > "Warning message: > 'newdata' had 31 rows but variable(s) found have 306 rows" > > In a similar script new.p were data ranging from 1 to 31 with the > cumulative likelihood associated with them. Now new.p looks a bit > like random numbers assigned to a list ranging from 1 to 306. (306 > is the number of datapoints in data1.) Unfortunately I am unable to > spot the difference of the two scripts. > > I would appreciate any pointer on my mistake (and hope that my > problem was understandable). > > TIA > Christian > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
Reasonably Related Threads
- Assessing calibration of Cox model with time-dependent coefficients
- Time-dependent coefficients in a Cox model with categorical variants
- what does it mean when "lm.gls" says that the weight matrix has wrong dimension?
- Cross validation, one more time (hopefully the last)
- package gbm, predict.gbm with offset