Zhao Jin
2013-May-09 15:10 UTC
[R] questions for using randomForest/pamr to predict biological data
Dear list: I am using randomForest and pamr to analyze some biological data. Basically the data show how many of each bacterium (bug) is present in each soil sample (sample) at each location (loc) and from each plant genotype (gen). I want to use the bugs to predict the plant genotypes. Please find some sample data in the dput format as the attached PDF file (a long file, can be saved as a plain txt file to dget the data). I ran randomForest with the following command: rf1=randomForest(gen~., mydata, ntree=1000, mtry=21, importance=T, proximity=T) I got a very high OOB error rate (87%), and high classification errors for each genotype: even the lowest error rate was 40%. I realized my data was somewhat unbalanced, so I played with the mtry, sampsize, and strata parameters. However, the OOB error rates and classification errors were still high. I noticed that with only two genotypes, the OOB error rate and classification errors went down significantly to 20%. I also tried the varSelRF package to select variables, but this did not lower the OOB errors much. With pamr, no variable was left after the default 30 threshold values, and the FDR rates were all 1. If I ordinate the data, I can see that there is no obvious cluster among the genotypes. So my questions are: 1) is random forest or pamr a valid approach to do this 2) can I further improve the randomForest or pamr predictions, and 3) can I at least use the bugs to predict some genotypes (eg. gen18 with a classification error of 40% by randomForest) confidently, if not all. Thanks a lot for any comment or suggestion, Zhao -- Zhao JIN Ph.D. Candidate Ruth Ley Lab 467 Biotech Field of Microbiology, Cornell University Lab: 607.255.4954 Cell: 412.889.3675 -------------- next part -------------- A non-text attachment was scrubbed... Name: sample_data.pdf Type: application/pdf Size: 636048 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130509/8998b412/attachment.pdf>