I have a logistic regression model I'm trying to do k-fold cross validation on. The number of observations is approximately 550 and an event rate of about 30% Does anyone have a recommendation for a B value to use for this data set? -- View this message in context: http://r.789695.n4.nabble.com/recommendation-on-B-for-validate-lrm-tp3486200p3486200.html Sent from the R help mailing list archive at Nabble.com.
For this case B=200 should work well if using the bootstrap. For cross-val. you can use B=10-fold cross-val and repeat the process 100 times for adequate precision, averaging over the 100 as done in http://biostat.mc.vanderbilt.edu/wiki/pub/Main/RmS/logistic.val.pdf (note this was using the Design package and there may be subtle changes with the rms package). Frank viostorm wrote:> > I have a logistic regression model I'm trying to do k-fold cross > validation on. > > The number of observations is approximately 550 and an event rate of about > 30% > > Does anyone have a recommendation for a B value to use for this data set? >-----Frank Harrell Department of Biostatistics, Vanderbilt University-- View this message in context: http://r.789695.n4.nabble.com/recommendation-on-B-for-validate-lrm-tp3486200p3488384.html Sent from the R help mailing list archive at Nabble.com.
Thanks so much for the reply it was exceptionally helpful! A couple of questions: 1. I was under the impression that k-fold with B=10 would train on 9/10, validate on 1/10, and repeat 10 times for each different 1/10th. Is this how the procedure works in R? 2. Is the reason you recommend repeating k-fold 100 times because the partitioning is random, ie not 1st 10th, 2nd 10, et cetera so you might obtain slightly different results? -- View this message in context: http://r.789695.n4.nabble.com/recommendation-on-B-for-validate-lrm-tp3486200p3508143.html Sent from the R help mailing list archive at Nabble.com.