Disclaimer first: I only heard about R fairly recently, so I apologize if this is either a simple or impossible request, but R looked like it might be a good framework for this sort of thing... Is it possible to write a script to run stepwise multinomial regressions on many *dependent* variables, and then compare results to a validation data set (e.g., Chow test)? Essentially, automate the process of finding best predictive model using a host of dependent and independent variables. I have a fairly short timeframe to work on this, so if someone is willing to help me in the next couple of days, I would be most appreciative. (And there might even be a hefty sum of cash involved!) Thanks, Daniel
Gabor Grothendieck
2004-Jul-28 05:06 UTC
[R] automating sequence of multinomial regressions
Daniel <spamiam <at> aroint.org> writes:> > Disclaimer first: I only heard about R fairly recently, so I apologize if > this is either a simple or impossible request, but R looked like it > might be a good framework for this sort of thing... > > Is it possible to write a script to run stepwise multinomial regressions > on many *dependent* variables, and then compare results to a validation > data set (e.g., Chow test)? Essentially, automate the process of finding > best predictive model using a host of dependent and independent variables. > > I have a fairly short timeframe to work on this, so if someone is > willing to help me in the next couple of days, I would be most > appreciative. (And there might even be a hefty sum of cash involved!)Setting aside the basic overfitting problems, the following does a stepwise regression on each of 10 dependent variables using the first 100 rows of birthwt. For the result of each of these 10 it then calculates the number of correct predictions using the remaining rows. require(nnet) require(MASS) # use birthwt data set and generate random matrix whose 10 cols are dep vars data(birthwt) set.seed(1) dep <- matrix(sample(2,189*10,rep=T),189)-1 # run one stepwise procedure for each dep variable using rows 1 to 100 # and store result in z so that z[[i]] has output from ith dep variable z <- apply(dep[1:100,], 2, function(d) step(multinom(formula = d ~., data = birthwt[1:100,-1]))) # calculate number of correct predictions for each model using rows 101 to 189 sapply(z,function(x) sum(predict(x, birthwt[101:189,-1]) == birthwt [101:189,1]))