I answered this question which was part of the online course correctly by
executing some commands and guessing.
But I didn't get the gist of this approach though my R code works.
I have a training and test dataset.
> nrow(training)
[1] 251
> nrow(testing)
[1] 82
> head(training1)
diagnosis IL_11 IL_13 IL_16 IL_17E IL_1alpha IL_3
IL_4
6 Impaired 6.103215 1.282549 2.671032 3.637051 -8.180721 -3.863233
1.208960
10 Impaired 4.593226 1.269463 3.476091 3.637051 -7.369791 -4.017384
1.808289
11 Impaired 6.919778 1.274133 2.154845 4.749337 -7.849364 -4.509860
1.568616
12 Impaired 3.218759 1.286356 3.593860 3.867347 -8.047190 -3.575551
1.916923
13 Impaired 4.102821 1.274133 2.876338 5.731246 -7.849364 -4.509860
1.808289
16 Impaired 4.360856 1.278484 2.776394 5.170380 -7.662778 -4.017384
1.547563
IL_5 IL_6 IL_6_Receptor IL_7 IL_8
6 -0.4004776 0.1856864 -0.51727788 2.776394 1.708270
10 0.1823216 -1.5342758 0.09668586 2.154845 1.701858
11 0.1823216 -1.0965412 0.35404039 2.924466 1.719944
12 0.3364722 -0.3987186 0.09668586 2.924466 1.675557
13 0.0000000 0.4223589 -0.53219115 1.564217 1.691393
16 0.2623643 0.4223589 0.18739989 1.269636 1.705116
The testing dataset is similar with 13 columns. Number of rows vary.
training1 <- training[,grepl("^IL|^diagnosis",names(training))]
test1 <- testing[,grepl("^IL|^diagnosis",names(testing))]
modelFit <- train(training1$diagnosis ~ training1$IL_11 + training1$IL_13 +
training1$IL_16 + training1$IL_17E + training1$IL_1alpha + training1$IL_3 +
training1$IL_4 + training1$IL_5 + training1$IL_6 + training1$IL_6_Receptor
+ training1$IL_7 + training1$IL_8,method="glm",data=training1)
confusionMatrix(test1$diagnosis,predict(modelFit, test1))
I get this error when I run the above command to get the confusion matrix.
*'newdata' had 82 rows but variables found have 251 rows '*
I thought this was simple. I train a model using the training dataset and
predict using the test dataset and get the accuracy.
Am I missing the obvious here ?
Thanks,
Mohan
[[alternative HTML version deleted]]