Hey, I've got a data set (e.g. named Data) which contains a lot of variables, for example: s1, s2, ..., s50 My first question is: It is possible to do this: Data$s1 But is it also possible to do something like this: Data$s1:s50 (I've tried a lot of versions of those without a result) My second question: I want to do a stepwise logistic regression. For this purpose I use the following procedures: result<-glm(...) step(result, direction="forward) Now the problem I have, is, that I have to include all my 50 variables (s1-s50), but I don't want to write them all down like y~s1+s2+s3+s4... (furthermore it has to be implemented in a loop, so I really need it). I've tried do store the 50 variables in a list (e.g. list[[1]]) and tried this: result<-glm(y ~ list[[1]], ...) This works! But if I try to do it stepwise result2<-step(result) I always get the same results as from glm without a stepwise approach. So obviously R can't handle this if you put a list in. How can I make this work? Thanks in advance, Anna -- View this message in context: http://www.nabble.com/Handle-lot-of-variables---Regression-tp25889056p25889056.html Sent from the R help mailing list archive at Nabble.com.
anna0102 wrote:> > I've got a data set (e.g. named Data) which contains a lot of variables, > for example: s1, s2, ..., s50 > > My first question is: > It is possible to do this: Data$s1 > But is it also possible to do something like this: Data$s1:s50 (I've tried > a lot of versions of those without a > result) > >Use the [] notation. For example Data[,c("s1","s2","s3")] or even better Data[,grep("s.*",names(a),value=TRUE)] anna0102 wrote:> > I want to do a stepwise logistic regression. For this purpose I use the > following procedures: > result<-glm(...) > step(result, direction="forward) > > Now the problem I have, is, that I have to include all my 50 variables > (s1-s50), but I don't want to write them all down like y~s1+s2+s3+s4... > (furthermore it has to be implemented in a loop, so I really need it). >Construct the formula dynamically. But please, start with only 3 or 4 variables and try if it work. Sometimes deep inside functions things can go wrong with this method, requiring Ripley's game-like workarounds. See http://finzi.psych.upenn.edu/R/Rhelp02a/archive/16599.html a=data.frame(s=1:10,s2=1:10,s4=1:10) form = paste("z~",grep("s.*",names(a),value=TRUE),collapse="+") glm(form,....) And be aware of the nonsense you can (replace by will certainly) get with stepwise regression and so many parameters. If I were to be treated by a cure created by stepwise regression, I would prefer voodoo. Search for "Harrell stepwise" read Frank's well justified soapboxes. Dieter -- View this message in context: http://www.nabble.com/Handle-lot-of-variables---Regression-tp25889056p25892047.html Sent from the R help mailing list archive at Nabble.com.
anna0102 wrote:> Hey, > > I've got a data set (e.g. named Data) which contains a lot of variables, for > example: s1, s2, ..., s50 > > My first question is: > It is possible to do this: Data$s1 > But is it also possible to do something like this: Data$s1:s50 (I've tried a > lot of versions of those without a result) > > My second question: > I want to do a stepwise logistic regression. For this purpose I use the > following procedures: > result<-glm(...) > step(result, direction="forward) > > Now the problem I have, is, that I have to include all my 50 variables > (s1-s50), but I don't want to write them all down like y~s1+s2+s3+s4... > (furthermore it has to be implemented in a loop, so I really need it). > I've tried do store the 50 variables in a list (e.g. list[[1]]) and tried > this: > result<-glm(y ~ list[[1]], ...) > This works! But if I try to do it stepwise > result2<-step(result) > I always get the same results as from glm without a stepwise approach. So > obviously R can't handle this if you put a list in. > How can I make this work? > > Thanks in advance, > Anna >Anna, You might as well just take a random sample of your candidate predictors. Stepwise regression isn't much better than that. Note that if you don't have enough events (say 15 times 50) to fit a full model then you don't have enough events to do stepwise regression without appropriate penalization. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University