Ricardo Antunes
2009-Nov-27 23:24 UTC
[R] Questions about use of multinomial for discrimination.
Dear All, I am looking at discriminating among several individuals based on a few variable sets (I think some variables do not make sense unless they are entered together, so I "force" them into the models together, hence datasets). I have done so with linear discriminant analysis (LDA) using "MASS::lda", with acceptable results. However, one of my collaborators suggested I use multinomial regression instead. I think his suggestion is mainly concerned with the choice of which variables (sets) best describe the data. I have used a stepwise approach (using klaR::stepclass) using the proportion of correct classifications to choose among the sets of variables. However I've been suggested that use a method that will give out an AIC instead, that will "penalize" the use of more variables. I have never done multinomial regression, and am uncertain about some details. I am looking into using R for this, and function multinom from MASS in particular. In my previous analysis with LDA I have measured the proportion of correct classifications using a jackknife procedure (i.e. leaving each datum out of the LDA at a time, and using the obtained discriminant functions to classify it). I am thinking about doing the same with the multinomial regression. I would appreciate any ideas about if this may not be good for some reason. Also, with the LDA I have looked at how much better the discriminant functions are compared with random assignment of individual identity. To do this I randomly shuffle the categories prior to running the LDA, then run the LDA, and measure the proportion of correct classifications using the above described jackknife procedure. I run this for many iterations and compare the distribution of proportion of correct classifications obtained from random assignment, with the one I obtained initially. Again, I though about repeating this with the multinom. Is this unnecessary as another way of looking at this already included in the multinom function? Perhaps this is more of a general statistics question, that one about the use of R, but I would appreciate any helpful comments. Thank you in advance. Ricardo Antunes