Hello,
Why mail a question just to me? Post to the list and the odds of getting
more answers (and better) are bigger.
As for your question, the problem is in the call to glm, you don't need
the prefix 'train$' in the formula, the argument 'data' solves
that and
when predicting R will look for the columns with names in the formula
and is unable to find columns called train$Outcome and train$Weight in
the new data.frame 'test'. Corrected:
mylogit <- glm(Outcome ~ Weight, data=train, family =
binomial("logit"))
predictions <- predict(mylogit, newdata = test, type= "response")
Hope this helps,
Rui Barradas
Em 26-11-2012 01:42, somnath bandyopadhyay escreveu:>
> Hi,
> I am trying some basic logistic regression analysis using glm. I just have
one dependent variable (Outcome) which is binary in nature and one independent
variable (Weight). I fit a model using a training data set (train) which has 85
observations and try to apply it on an independent dataset (test) which has 55
observations. When I apply the predict function on the fitted model for the new
dataset, I get the following warning "Warning message: 'newdata'
had 55 rows but variable(s) found have 85 rows" and the predict works on
the training observations and not on the test observations.
>
> Following is he session info, code and the training and test datasets I am
using.
>
> What am I doing wrong? Any help would be greatly appreciated.
>
> Thanks,
> S.
>
>> train <- read.table("train_data.txt", header=T,
row.names=1, sep="\t")
>> test<- read.table("test_data.txt", header=T, row.names=1,
sep="\t")
>> mylogit <- glm(train$Outcome ~ train$Weight, data=train, family =
binomial("logit"))
>> predictions <- predict(mylogit, newdata = test, type=
"response")
> Warning message:
> 'newdata' had 55 rows but variable(s) found have 85 rows
>
>
>> sessionInfo()
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-pc-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252 LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
>
>
>
>> train
> Outcome Weight
> AB256939_21 0 0.331
> AB257076_21 0 0.308
> AB257079_21 0 0.453
> AB415508_21 0 0.303
> AB700497_21 0 0.354
> AB904508_21 0 0.336
> AC048719_21 0 0.420
> AC185939_21 0 0.249
> AC185940_21 0 1.525
> AC445840_21 0 0.261
> E7490523_21 0 0.269
> E7490524_21 0 0.213
> E7659579_21 0 0.360
> E7661528_21 0 0.271
> E7781094_21 0 0.156
> E7781095_21 0 0.221
> E7781096_21 0 0.098
> E7969081_21 0 0.430
> E8117594_21 0 0.321
> E8133295_21 0 0.166
> E8161578_22 0 0.269
> E8483037_21 0 0.162
> E8559720_21 0 0.226
> L1065550_18 0 0.396
> L1065607_17 0 0.541
> L1065944_24 0 0.131
> L1066017_20 0 0.421
> L1069261_12 0 0.357
> L1069262_14 0 0.309
> L1069263_27 0 0.283
> L1069297_24 0 0.620
> L1081528_21 0 0.561
> L1084066_21 0 0.564
> L1086090_21 0 0.649
> L1104280_17 0 0.181
> L1111362_22 0 0.199
> L1118063_15 0 0.369
> L1133550_21 0 0.302
> L1144201_14 0 0.249
> L1155023_7 0 0.257
> L1158386_21 0 0.470
> L1163051_4 0 0.446
> ...........................
> ...........................
> ...........................
>
>
>> test
> Weight
> AB256870_21 0.364
> AB256873_21 0.329
> AB415518_21 0.219
> AB460669_21 0.481
> AB609036_21 0.313
> AB609038_21 0.196
> AB700495_21 0.402
> AB700498_21 0.343
> AC112834_21 0.372
> AC185937_21 0.270
> AC269527_21 0.285
> E7352023_21 0.358
> E7661554_21 0.471
> E7750502_21 0.437
> E7845183_21 0.232
> E7854155_21 0.474
> E7854156_21 0.121
> E7924877_21 0.312
> E7969079_21 0.423
> E8139256_21 0.329
> E8161577_22 1.060
> E8161580_21 0.157
> E8364473_21 0.227
> E8364474_21 0.069
> L1065940_14 0.256
> L1065946_10 0.184
> L1066018_25 0.282
> L1069260_15 1.094
> ................................
> ................................
>
>
>
>
>