thr3ads.net - R help - [R] problem with 'predict' [Jul 2011]

If this information is useful, please help other people find it:
Share via:

Meesters, Christian

2011-Jul-11 15:51 UTC

[R] problem with 'predict'

Hi,

I would like to tabulate the likelihood for an affection. For this, I retrieve
indices of affected people and controls for my data set and proceed as follows:

flags <- c(rep(1, length(patient_indices)), rep(0, length(control_indices)))
# dataset is a data.frame and param the parameter to be analysed:
data1  <- dataset[,param][c(patient_indices, control_indices)] 
fit1 <- glm(flags ~ data1, family = binomial)
new.data    <- seq(0, 300, 10)
new.p   <- predict(fit1, data.frame(newdata = new.data), type =
"response")

Which than gives data not in dependence of new.data and a warning which reads
"Warning message:
'newdata' had 31 rows but variable(s) found have 306 rows"

In a similar script new.p were data ranging from 1 to 31 with the cumulative
likelihood associated with them. Now new.p looks a bit like random numbers
assigned to a list ranging from 1 to 306. (306 is the number of datapoints in
data1.) Unfortunately I am unable to spot the difference of the two scripts.

I would appreciate any pointer on my mistake (and hope that my problem was
understandable).

TIA
Christian

Dennis Murphy

2011-Jul-11 17:22 UTC

head link

[R] problem with 'predict'

Hi:

The data frame you submit as newdata = to predict() has to have the
same variables as the right hand side of the model formula. For
example, if the model has covariates x1, x2, x3, then the data frame
you create as the newdata has to consist of columns named x1, x2, x3.
Another problem is that you want to combine all the variables into a
data frame if you intend to use the predict() method, something like

mdata <- data.frame(flags, data1)
fit1 <- glm(flags ~ ., data = mdata, family = binomial)

The prediction data frame for newdata then has to have the same
variable names as those in data1.

HTH,
Dennis

On Mon, Jul 11, 2011 at 8:51 AM, Meesters, Christian <meesters at
aesku.com> wrote:> Hi,
>
> I would like to tabulate the likelihood for an affection. For this, I
retrieve indices of affected people and controls for my data set and proceed as
follows:
>
> flags <- c(rep(1, length(patient_indices)), rep(0,
length(control_indices)))
> # dataset is a data.frame and param the parameter to be analysed:
> data1 ?<- dataset[,param][c(patient_indices, control_indices)]
> fit1 <- glm(flags ~ data1, family = binomial)
> new.data ? ?<- seq(0, 300, 10)
> new.p ? <- predict(fit1, data.frame(newdata = new.data), type =
"response")
>
> Which than gives data not in dependence of new.data and a warning which
reads
> "Warning message:
> 'newdata' had 31 rows but variable(s) found have 306 rows"
>
> In a similar script new.p were data ranging from 1 to 31 with the
cumulative likelihood associated with them. Now new.p looks a bit like random
numbers assigned to a list ranging from 1 to 306. (306 is the number of
datapoints in data1.) Unfortunately I am unable to spot the difference of the
two scripts.
>
> I would appreciate any pointer on my mistake (and hope that my problem was
understandable).
>
> TIA
> Christian
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

David Winsemius

2011-Jul-11 17:50 UTC

head link

[R] problem with 'predict'

On Jul 11, 2011, at 11:51 AM, Meesters, Christian wrote:
> Hi,
>
> I would like to tabulate the likelihood for an affection. For this,  
> I retrieve indices of affected people and controls for my data set  
> and proceed as follows:
>
> flags <- c(rep(1, length(patient_indices)), rep(0,  
> length(control_indices)))
> # dataset is a data.frame and param the parameter to be analysed:
> data1  <- dataset[,param][c(patient_indices, control_indices)]
> fit1 <- glm(flags ~ data1, family = binomial)
> new.data    <- seq(0, 300, 10)
> new.p   <- predict(fit1, data.frame(newdata = new.data), type =  
> "response")
Should (probably)  have been ... names of RHS variables need to be  
exact match:

new.p   <- predict(fit1, newdata= data.frame(data1 = new.data), type =  
"response")

(Obviously untested.)>
> Which than gives data not in dependence of new.data and a warning  
> which reads
> "Warning message:
> 'newdata' had 31 rows but variable(s) found have 306 rows"
>
> In a similar script new.p were data ranging from 1 to 31 with the  
> cumulative likelihood associated with them. Now new.p looks a bit  
> like random numbers assigned to a list ranging from 1 to 306. (306  
> is the number of datapoints in data1.) Unfortunately I am unable to  
> spot the difference of the two scripts.
>
> I would appreciate any pointer on my mistake (and hope that my  
> problem was understandable).
>
> TIA
> Christian
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT

Reasonably Related Threads

Search for more possibly parallel threads

R help - Jul 2011 - problem with 'predict'

[R] problem with 'predict'

[R] problem with 'predict'

[R] problem with 'predict'

Reasonably Related Threads