Greg Blevins
2004-Apr-03 03:03 UTC
[R] Seeking help for outomating regression (over columns) and storing selected output
Hello, I have spent considerable time trying to figure out that which I am about to describe. This included searching Help, consulting my various R books, and trail and (always) error. I have been assuming I would need to use a loop (looping over columns) but perhaps and apply function would do the trick. I have unsuccessfully tried both. A scaled down version of my situation is as follows: I have a dataframe as follows: ID Y x1 x2 x3 usergroup. Y is a continous criterion, x1-x3 continous predictors, and usergroup is coded a 1, 2 or 3 to indicate user status. My end goal is a (dataframe or matrix) with just the regression coef from each of 12 runs (each x regressed separately on Y for the total sample and for each usergroup). I envision output as follows, a three column by four row dataframe or matrix. Y and x1; Y and x2; Y and x3. Total sample: usergroup 1: usergroup 2: (Regression Coefs fill the matrix) usergroup 3: Using 1.8.1 Windows 2000 and XP Help would be most appreciated. Greg Blevins, Partner The Market Solutions Group [[alternative HTML version deleted]]
Liaw, Andy
2004-Apr-03 13:21 UTC
[R] Seeking help for outomating regression (over columns) and storing selected output
I'm quite sure there're better ways, but this works for me:> dat <- data.frame(y=rnorm(30), x1=runif(30), x2=runif(30), x3=runif(30),+ group=factor(rep(1:3, each=10)))> > getCoef <- function(dat) {+ apply(dat[,c("x1","x2","x3")], 2, + function(x) lm.fit(cbind(1, x), dat$y)$coefficients[2]) + }> clist <- by(dat[,c("y","x1","x2","x3")], dat$group, getCoef) > cmat <- do.call("rbind", clist) > cmatx1 x2 x3 1 -1.8646962 0.6182181 -1.7859563 2 -1.5031314 -1.0639626 -0.2982066 3 -0.8302013 0.8111539 -1.0372803 HTH, Andy> From: Greg Blevins > > Hello, > > I have spent considerable time trying to figure out that > which I am about to describe. This included searching Help, > consulting my various R books, and trail and (always) error. > I have been assuming I would need to use a loop (looping over > columns) but perhaps and apply function would do the trick. > I have unsuccessfully tried both. > > A scaled down version of my situation is as follows: > > I have a dataframe as follows: > > ID Y x1 x2 x3 usergroup. > > Y is a continous criterion, x1-x3 continous predictors, and > usergroup is coded a 1, 2 or 3 to indicate user status. > > My end goal is a (dataframe or matrix) with just the > regression coef from each of 12 runs (each x regressed > separately on Y for the total sample and for each usergroup). > I envision output as follows, a three column by four row > dataframe or matrix. > > > Y and x1; Y and x2; > Y and x3. > Total sample: > usergroup 1: > usergroup 2: (Regression Coefs fill the matrix) > usergroup 3: > > Using 1.8.1 > Windows 2000 and XP > > Help would be most appreciated. > > Greg Blevins, Partner > The Market Solutions Group > [[alternative HTML version deleted]] > >------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments,...{{dropped}}
Robert W. Baer, Ph.D.
2004-Apr-03 13:30 UTC
[R] Seeking help for outomating regression (over columns) andstoring selected output
Here's one simplistic solution, perhaps there are better ones: # Make some test data and place in dataframe x1=rnorm(20) x2=rnorm(20) x3=rnorm(20) x4=as.factor(sample(c("G1","G2","G3"),20,replace=T)) y1=2*x1+4*x2+0.5*x3+as.numeric(x4)+rnorm(20) df=data.frame(y1,x1,x2,x3,x4) # Now create the ouput dataframe described out=data.frame(result=c("Intercept",levels(df$x4))) out$X1=as.numeric(coef(lm(df$y1~df$x1+df$x4))) out$X2=as.numeric(coef(lm(df$y1~df$x2+df$x4))) out$X3=as.numeric(coef(lm(df$y1~df$x3+df$x4))) #look at it df out ----- Original Message ----- From: "Greg Blevins" <gblevins at mn.rr.com> To: "R-Help" <r-help at stat.math.ethz.ch> Sent: Friday, April 02, 2004 9:03 PM Subject: [R] Seeking help for outomating regression (over columns) andstoring selected output> Hello, > > I have spent considerable time trying to figure out that which I am aboutto describe. This included searching Help, consulting my various R books, and trail and (always) error. I have been assuming I would need to use a loop (looping over columns) but perhaps and apply function would do the trick. I have unsuccessfully tried both.> > A scaled down version of my situation is as follows: > > I have a dataframe as follows: > > ID Y x1 x2 x3 usergroup. > > Y is a continous criterion, x1-x3 continous predictors, and usergroup iscoded a 1, 2 or 3 to indicate user status.> > My end goal is a (dataframe or matrix) with just the regression coef fromeach of 12 runs (each x regressed separately on Y for the total sample and for each usergroup). I envision output as follows, a three column by four row dataframe or matrix.> > > Y and x1; Y and x2; Y and x3. > Total sample: > usergroup 1: > usergroup 2: (Regression Coefs fill the matrix) > usergroup 3: > > Using 1.8.1 > Windows 2000 and XP > > Help would be most appreciated. > > Greg Blevins, Partner > The Market Solutions Group > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide!http://www.R-project.org/posting-guide.html>
Gabor Grothendieck
2004-Apr-03 16:45 UTC
[R] Seeking help for outomating regression (over columns) and storing selected output
Note that there is a QUESTION at the end regarding random effects. Suppose your data frame is df and has components y, x1, x2, x3 and u where u is a factor. 1. There was a problem posted about doing repeated regressions (search for Operating on windows of data) last month that has similarities to this one. Making use of those ideas, the first sapply below loops over the y~xi regressions and the next two loop over the usergroup specific regressions. We just rbind them altogether: xvars <- c("x1", "x2", "x3") rbind( sapply( xvars, function(xi) coef( lm(y ~ df[,xi], data=df))[[2]] ), sapply( xvars, function(xi) sapply( levels(df$u), function(ulev) coef(lm(y ~ df[,xi], subset=u==ulev, data=df))[[2]] ) ) ) 2. Another possibility is to create a giant regression that does all the usergroup specific regressions at once and then repeat it without the usergroup variable to get the rest. df2 is a new data frame that strings out all the x variables into a single long column and adds a new factor i that identifies which x variable it is. y and u are repeated three times to bring them into line with x. ( xvars <- c("x1", "x2", "x3") xm <- as.matrix(df[,xvars]) df2 <- data.frame(y=rep(df$y,3), x = c(xm), i=factor(c(col(xm))), u=rep(u,3)) # We could have alternately used reshape like this: # df2 <- reshape(df,timevar="i",times=factor(1:3), # varying=list(xvars),direction="long",v.name="x") # The slopes by usergroup and across user group are: coeff.u <- coef(lm(y ~ i/u/x, data=df2)) coeff.all <- coef(lm(y ~ i/x, data=df2)) # Pick off the slopes (they are at the end of each coef vector) and reform: z <- matrix( c( matrix( coef.all, nc=2)[,2], matrix( coef.u, nc=2)[,2] ), nc=3) colnames(z) <- xvars rownames(z) <- c("All", levels(df$u)) 3. Note that the giant regression approach works as long as you are only interested in the coefficients, however, if you were interested in the variances then this would not work since each of the two regressions uses a pooled estimate of variance. QUESTION: As a matter of interest, would someone that is familiar with random effects models show what the corresponding giant model is with separate variances for each regression. P.S. I tried the above out on the following which is similar to the original problem except there are 4 levels in u: data(state) x <- state.x77[,1:3] u <- state.region y <- state.x77[,4] df <- data.frame(y=y, x1=x[,1], x2=x[,2], x3=x[,3], u=factor(u)) Greg Blevins <gblevins <at> mn.rr.com> writes: : : Hello, : : I have spent considerable time trying to figure out that which I am about to describe. This included : searching Help, consulting my various R books, and trail and (always) error. I have been assuming I would : need to use a loop (looping over columns) but perhaps and apply function would do the trick. I have : unsuccessfully tried both. : : A scaled down version of my situation is as follows: : : I have a dataframe as follows: : : ID Y x1 x2 x3 usergroup. : : Y is a continous criterion, x1-x3 continous predictors, and usergroup is coded a 1, 2 or 3 to indicate user status. : : My end goal is a (dataframe or matrix) with just the regression coef from each of 12 runs (each x regressed : separately on Y for the total sample and for each usergroup). I envision output as follows, a three column : by four row dataframe or matrix. : : Y and x1; Y and x2; Y and x3. : Total sample: : usergroup 1: : usergroup 2: (Regression Coefs fill the matrix) : usergroup 3: : : Using 1.8.1 : Windows 2000 and XP : : Help would be most appreciated. : : Greg Blevins, Partner : The Market Solutions Group : [[alternative HTML version deleted]]