thr3ads.net - R help - [R] Training a model using glm [Sep 2014]

If this information is useful, please help other people find it:
Share via:

Mohan Radhakrishnan

2014-Sep-17 06:15 UTC

[R] Training a model using glm

I answered this question which was part of the online course correctly by
executing some commands and guessing.

But I didn't get the gist of this approach though my R code works.

I have a training and test dataset.
> nrow(training)
[1] 251
> nrow(testing)
[1] 82
> head(training1)
   diagnosis    IL_11    IL_13    IL_16   IL_17E IL_1alpha      IL_3
IL_4

6   Impaired 6.103215 1.282549 2.671032 3.637051 -8.180721 -3.863233
1.208960

10  Impaired 4.593226 1.269463 3.476091 3.637051 -7.369791 -4.017384
1.808289

11  Impaired 6.919778 1.274133 2.154845 4.749337 -7.849364 -4.509860
1.568616

12  Impaired 3.218759 1.286356 3.593860 3.867347 -8.047190 -3.575551
1.916923

13  Impaired 4.102821 1.274133 2.876338 5.731246 -7.849364 -4.509860
1.808289

16  Impaired 4.360856 1.278484 2.776394 5.170380 -7.662778 -4.017384
1.547563

         IL_5       IL_6 IL_6_Receptor     IL_7     IL_8

6  -0.4004776  0.1856864   -0.51727788 2.776394 1.708270

10  0.1823216 -1.5342758    0.09668586 2.154845 1.701858

11  0.1823216 -1.0965412    0.35404039 2.924466 1.719944

12  0.3364722 -0.3987186    0.09668586 2.924466 1.675557

13  0.0000000  0.4223589   -0.53219115 1.564217 1.691393

16  0.2623643  0.4223589    0.18739989 1.269636 1.705116

The testing dataset is similar with 13 columns. Number of rows vary.


training1 <- training[,grepl("^IL|^diagnosis",names(training))]

test1 <- testing[,grepl("^IL|^diagnosis",names(testing))]

modelFit <- train(training1$diagnosis ~ training1$IL_11 + training1$IL_13 +
training1$IL_16 + training1$IL_17E + training1$IL_1alpha + training1$IL_3 +
training1$IL_4 + training1$IL_5 + training1$IL_6 + training1$IL_6_Receptor
+ training1$IL_7 + training1$IL_8,method="glm",data=training1)

confusionMatrix(test1$diagnosis,predict(modelFit, test1))

I get this error when I run the above command to get the confusion matrix.

*'newdata' had 82 rows but variables found have 251 rows '*

I thought this was simple. I train a model using the training dataset and
predict using the test dataset and get the accuracy.

Am I missing the obvious here ?

Thanks,

Mohan

	[[alternative HTML version deleted]]

R help - Sep 2014 - Training a model using glm

[R] Training a model using glm