Michael Wither
2012-Jan-26 06:08 UTC
[R] R extracting regression coefficients from multiple regressions using lapply command
Hi, I have a question about running multiple in regressions in R and then storing the coefficients. I have a large dataset with several variables, one of which is a state variable, coded 1-50 for each state. I'd like to run a regression of 28 select variables on the remaining 27 variables of the dataset (there are 55 variables total), and specific for each state, ie run a regression of variable1 on covariate1, covariate2, ..., covariate27 for observations where state==1. I'd then like to repeat this for variable1 for states 2-50, and the repeat the whole process for variable2, variable3,..., variable28. I think I've written the correct R code to do this, but the next thing I'd like to do is extract the coefficients, ideally into a coefficient matrix. Could someone please help me with this? Here's the code I've written so far, I'm not sure if this is the best way to do this. Please help me. for (num in 1:50) { #PUF is the data set I'm using #Subset the data by states PUFnum <- subset(PUF, state==num) #Attach data set with state specific data attach(PUFnum) #Run our prediction regression #the variables class1 through e19700 are the 27 covariates I want to use regression <- lapply(PUFnum, function(z) lm(z ~ class1+class2+class3+class4+class5+class6+class7+xtot+e00200+e00300+e00600+e00900+e01000+p04470+e04800+e09600+e07180+e07220+e07260+e06500+e10300+e59720+e11900+e18425+e18450+e18500+e19700)) Beta <- lapply(regression, function(d) d<- coef(regression$d)) detach(PUFnum) } Thanks, Mike [[alternative HTML version deleted]]
Jean V Adams
2012-Jan-26 14:42 UTC
[R] R extracting regression coefficients from multiple regressions using lapply command
Michael Wither wrote on 01/26/2012 12:08:19 AM:> Hi, I have a question about running multiple in regressions in R andthen> storing the coefficients. I have a large dataset with severalvariables,> one of which is a state variable, coded 1-50 for each state. I'd like to > run a regression of 28 select variables on the remaining 27 variables of > the dataset (there are 55 variables total), and specific for each state,ie> run a regression of variable1 on covariate1, covariate2, ...,covariate27> for observations where state==1. I'd then like to repeat this forvariable1> for states 2-50, and the repeat the whole process for variable2, > variable3,..., variable28. I think I've written the correct R code to do > this, but the next thing I'd like to do is extract the coefficients, > ideally into a coefficient matrix. Could someone please help me withthis?> Here's the code I've written so far, I'm not sure if this is the bestway> to do this. Please help me. > > for (num in 1:50) { > > #PUF is the data set I'm using > > #Subset the data by states > PUFnum <- subset(PUF, state==num) > > #Attach data set with state specific data > attach(PUFnum) > > #Run our prediction regression > #the variables class1 through e19700 are the 27 covariates I want touse> regression <- lapply(PUFnum, function(z) lm(z ~ > class1+class2+class3+class4+class5+class6+class7+xtot+e00200+e00300 > +e00600+e00900+e01000+p04470+e04800+e09600+e07180+e07220+e07260 > +e06500+e10300+e59720+e11900+e18425+e18450+e18500+e19700)) > > Beta <- lapply(regression, function(d) d<- coef(regression$d)) > > detach(PUFnum) > > } > Thanks, > MikeThis should help you get started. # You don't provide any sample data, so I made some up myself nstates <- 5 nobs <- 30 nys <- 3 nxs <- 4 PUF <- data.frame(matrix(rnorm(nstates*nobs*(nys+nxs)), nrow=nstates*nobs, dimnames=list(NULL, c(paste("Y", 1:nys, sep=""), paste("X", 1:nxs, sep=""))))) PUF$state <- rep(1:nstates, nobs) head(PUF) # create a character vector of all your covariate names # separated by a plus sign # this will serve as the right half of your regression equations covariates <- paste(names(PUF)[nys + (1:nxs)], collapse=" + ") # create an empty array to be filled with coefficients coefs <- array(NA, dim=c(nstates, nys, nxs+1)) # fill the array with coefficients # this will work for you if the first 28 columns of your PUF # data frame are the response variables for(i in 1:nstates) { for(j in 1:nys) { coefs[i, j, ] <- lm(formula(paste(names(PUF)[j], covariates, sep=" ~ ")), data=PUF[PUF$state==i, ])$coef }} coefs Jean [[alternative HTML version deleted]]
Jean V Adams
2012-Feb-01 12:40 UTC
[R] R extracting regression coefficients from multiple regressions using lapply command
Michael Wither <michael.j.wither@gmail.com> wrote on 01/28/2012 03:18:29 PM:> [image removed] > > Re: [R] R extracting regression coefficients from multiple > regressions using lapply command > > Michael Wither > > to: > > Jean V Adams > > 01/28/2012 03:18 PM > > Hi, this code is actually great (much better than any other response > I've gotten or seen). But I have one more question. This puts the > regression coefficients for state 1 in row 1, which is what I do > need, but then it puts coef1 from the first regression in column 1, > then coef1 from the second regression in column 2, then coef1 from > the third regression in column 3, ... What I need is coef 1 from > first regression in column 1, coef2 from regression 1 in column 2, > coef3 from regression 1 in column 3, ... And then after the first 28 > columns are filled in (27 covariate plus the constant term), I'd > like coef1 from the 2nd regression to go in column 29, coef2 from > the 2nd regression to go in column 30,... > Does this make sense? Do you know how to do this? > Thank you again so much for your help, > MichaelAdjust the dimensions of your array, and fill in the data accordingly ... coefs <- array(NA, dim=c(nstates, nxs+1, nys)) for(i in 1:nstates) { for(k in 1:nys) { coefs[i, , k] <- lm(formula(paste(names(PUF)[k], covariates, sep=" ~ ")), data=PUF[PUF$state==i, ])$coef }} Jean> On Fri, Jan 27, 2012 at 1:05 AM, Michael Wither<michael.j.wither@gmail.com> > wrote: > Thanks, this does help a bit. I'll keep on trying to figure it out. > Thanks , > Michael > > On Thu, Jan 26, 2012 at 6:42 AM, Jean V Adams <jvadams@usgs.gov> wrote: > > Michael Wither wrote on 01/26/2012 12:08:19 AM: > > > > Hi, I have a question about running multiple in regressions in R andthen> > storing the coefficients. I have a large dataset with severalvariables,> > one of which is a state variable, coded 1-50 for each state. I'd liketo> > run a regression of 28 select variables on the remaining 27 variablesof> > the dataset (there are 55 variables total), and specific for eachstate, ie> > run a regression of variable1 on covariate1, covariate2, ...,covariate27> > for observations where state==1. I'd then like to repeat this forvariable1> > for states 2-50, and the repeat the whole process for variable2, > > variable3,..., variable28. I think I've written the correct R code todo> > this, but the next thing I'd like to do is extract the coefficients, > > ideally into a coefficient matrix. Could someone please help me withthis?> > Here's the code I've written so far, I'm not sure if this is the bestway> > to do this. Please help me. > > > > for (num in 1:50) { > > > > #PUF is the data set I'm using > > > > #Subset the data by states > > PUFnum <- subset(PUF, state==num) > > > > #Attach data set with state specific data > > attach(PUFnum) > > > > #Run our prediction regression > > #the variables class1 through e19700 are the 27 covariates I wantto use> > regression <- lapply(PUFnum, function(z) lm(z ~ > > class1+class2+class3+class4+class5+class6+class7+xtot+e00200+e00300 > > +e00600+e00900+e01000+p04470+e04800+e09600+e07180+e07220+e07260 > > +e06500+e10300+e59720+e11900+e18425+e18450+e18500+e19700)) > > > > Beta <- lapply(regression, function(d) d<- coef(regression$d)) > > > > detach(PUFnum) > > > > } > > Thanks, > > Mike > > > This should help you get started. > > # You don't provide any sample data, so I made some up myself > nstates <- 5 > nobs <- 30 > nys <- 3 > nxs <- 4 > PUF <- data.frame(matrix(rnorm(nstates*nobs*(nys+nxs)),nrow=nstates*nobs,> dimnames=list(NULL, c(paste("Y", 1:nys, sep=""), paste("X", > 1:nxs, sep=""))))) > PUF$state <- rep(1:nstates, nobs) > head(PUF) > > # create a character vector of all your covariate names > # separated by a plus sign > # this will serve as the right half of your regression equations > covariates <- paste(names(PUF)[nys + (1:nxs)], collapse=" + ") > > # create an empty array to be filled with coefficients > coefs <- array(NA, dim=c(nstates, nys, nxs+1)) > > # fill the array with coefficients > # this will work for you if the first 28 columns of your PUF > # data frame are the response variables > for(i in 1:nstates) { > for(j in 1:nys) { > coefs[i, j, ] <- lm(formula(paste(names(PUF)[j], covariates, > sep=" ~ ")), > data=PUF[PUF$state==i, ])$coef > }} > coefs > > Jean > > > -- > Michael J. Wither > 2884 Torrey Pines Road > La Jolla, CA 92037 > (216) 970-5036 (cell) > michael.j.wither@gmail.com[[alternative HTML version deleted]]