Dear List, I have developed two models i want to use to predict a response, one with a binary response and one with a ordinal response. My original plan was to divide the data into test (300 entries) and training (1000 entries) and check the power of the model by looking at the % correct predictions. However i have been told my a colleague that 1300 entries is far too little to partition the data set and i should use the whole data set, and determine the power of the model with scores such as c-value and Brier score and use bootstrapping. I understand how to bootstrap in R however i have never used it on predicted values. My questions are - 1. Using the boot() command how do i use this to test the power of my predictive model? 2. Is it possible to bootstrap brier score or is this not necessary? 3. ( This is a separate point i am struggling with, i thought i would include it here instead of posting again!) I have selected the most likely model with AIC criteria from a set of candidate GLMM models, however as GLMM has no predict function i have used the best model and excluded the random effects and ran it as a glm and used the predict function from here - is this OK? Thanks Sam
Split sample validation is highly unstable with your sample size. The rms package can help with bootstrapping or cross-validation, assuming you have all modeling steps repreated for each resample. Frank ----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Validation-Training-test-data-tp2718523p2718905.html Sent from the R help mailing list archive at Nabble.com.
Thanks for this, I had used> validate(model0, method="boot",B=200)To get a index.corrected Brier score, However i am also wanting to bootstrap the predicted probabilities output from predict(model1, type = "response") to get a idea of confidence, or am i best just using se.fit = TRUE and then calculating the 95%CI? Does what i want to do make sense? Thanks On 29 Sep 2010, at 13:38, Frank Harrell wrote: Split sample validation is highly unstable with your sample size. The rms package can help with bootstrapping or cross-validation, assuming you have all modeling steps repreated for each resample. Frank ----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Validation-Training-test-data-tp2718523p2718905.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
It all depends on the ultimate use of the results. Frank ----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Validation-Training-test-data-tp2718523p2719370.html Sent from the R help mailing list archive at Nabble.com.