Hi! I have a dataset with some 300+ variables and 2000+ records. I'd like to grind through a bunch of analyses on the variables by using a script, but can't figure out how to refer to variable names properly. For some of the simpler stuff I use various "apply" functions, but for others (like t-tests etc) I need by command procedures. I've tried various flavors of "for(var in names(Dataset)){...}" but this does not work consistently. Actually, "for(var in names(Dataset){print var}, seems to work perfectly, giving a list of variable names, but "for(var in names(Dataset)){mean(var, na.rm=T) or for(var in names(Dataset)){glm(var~var1+var2+var3....} do not. Any suggestions about how best to go about this? Thanks Jon
Hi, perhaps that is what you want? > df <- as.data.frame(matrix(runif(100),ncol=10)) > form <- as.formula(paste(names(df)[length(df)], "~ .")) > lm(form,data=df) Call: lm(formula = form, data = df) Coefficients: (Intercept) V1 V2 V3 V4 -1.367 -2.920 3.631 -7.259 -3.704 V5 V6 V7 V8 V9 4.225 3.049 4.522 2.496 -0.578 > form <- as.formula(paste(names(df)[length(df)], "~",paste(names(df)[3],names(df)[4],sep="+"))) > lm(form,data=df) Call: lm(formula = form, data = df) Coefficients: (Intercept) V3 V4 0.652 0.360 -0.448 Regards,Christian> Hi! > > I have a dataset with some 300+ variables and 2000+ records. I'd like to grind > through a bunch of analyses on the variables by using a script, but can't > figure out how to refer to variable names properly. For some of the simpler > stuff I use various "apply" functions, but for others (like t-tests etc) I need > by command procedures. I've tried various flavors of "for(var in > names(Dataset)){...}" but this does not work consistently. Actually, "for(var > in names(Dataset){print var}, seems to work perfectly, giving a list of > variable names, but "for(var in names(Dataset)){mean(var, na.rm=T) or for(var > in names(Dataset)){glm(var~var1+var2+var3....} do not. > > Any suggestions about how best to go about this? > > Thanks > > Jon > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >
For the mean() example, I believe this should work (untested) for (var in names(Dataset)) print( mean( Dataset[[var]] , na.rm=TRUE ) ) or for (var in names(Dataset)) print( mean( Dataset[,var] , na.rm=TRUE ) ) But it's harder with lm, glm, and friends. For them I think maybe you can do it by constructing a formula object, see ?formula. Maybe something like, tmpf <- as.formula( paste( var, ' ~var1 + var2 + var3') ) glm( tmpf , Dataset) inside the loop, but I haven't done this and am not an expert. Here's a quick example:> foo <- data.frame( x=1:10, y=rnorm(10) ) > ick <- as.formula( ' x ~ y') > lm(ick,foo)Call: lm(formula = ick, data = foo) Coefficients: (Intercept) y 5.895 2.158 ## compare with:> lm(x~y,foo)Call: lm(formula = x ~ y, data = foo) Coefficients: (Intercept) y 5.895 2.158 But it's a question that comes up periodically on r-help, so I'd also suggest searching the archives. -Don At 10:50 AM -0400 6/18/09, Jon Erik Ween wrote:>Hi! > >I have a dataset with some 300+ variables and 2000+ records. I'd like to grind >through a bunch of analyses on the variables by using a script, but can't >figure out how to refer to variable names properly. For some of the simpler >stuff I use various "apply" functions, but for others (like t-tests >etc) I need >by command procedures. I've tried various flavors of "for(var in >names(Dataset)){...}" but this does not work consistently. Actually, "for(var >in names(Dataset){print var}, seems to work perfectly, giving a list of >variable names, but "for(var in names(Dataset)){mean(var, na.rm=T) or for(var >in names(Dataset)){glm(var~var1+var2+var3....} do not. > >Any suggestions about how best to go about this? > >Thanks > >Jon > >______________________________________________ >R-help at r-project.org mailing list >https:// stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http:// www. R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- -------------------------------------- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory Livermore, CA, USA 925-423-1062