Dear R users, I have a large data set which includes data from 300 cities. I want to run a biviriate regression for each city and record the coefficient and the adjusted R square. For example, in the following, I have 10 cities represented by numbers from 1 to 10: x = cumsum(c(0, runif(999, -1, +1))) y = cumsum(c(0, runif(999, -1, +1))) city = rep(1:10,each=100) data<-data.frame(cbind(x,y,city)) I can manually run regressions for each city: fit_city1 <- lm(y ~ x,data=subset(data,data$city==1)) summary(fit_city1) Obvious, it is very tedious to run 300 regressions. I wonder if there is a quicker way to do this. Use for loop? what I want to see is something like this: City Coefficient Adjusted R square 1 -0.05 0.36 2 -0.12 0.20 3 -0.05 0.32 ..... Any advice is appreciated! Gary [[alternative HTML version deleted]]
Hi, Try: res <- do.call(rbind,lapply(split(data,data$city),function(z) {fit_city <- lm(y~x,data=z);data.frame(City=unique(z$city),Coefficient=coef(fit_city)[2],Adjusted_R_square= summary(fit_city)$adj.r.squared)})) A.K. On Monday, November 25, 2013 6:37 PM, Gary Dong <pdxgary163 at gmail.com> wrote: Dear R users, I have a large data set which includes data from 300 cities. I want to run a biviriate regression for each city and record the coefficient and the adjusted R square. For example, in the following, I have 10 cities represented by numbers from 1 to 10: x = cumsum(c(0, runif(999, -1, +1))) y = cumsum(c(0, runif(999, -1, +1))) city = rep(1:10,each=100) data<-data.frame(cbind(x,y,city)) I can manually run regressions for each city: fit_city1 <- lm(y ~ x,data=subset(data,data$city==1)) summary(fit_city1) Obvious, it is very tedious to run 300 regressions. I wonder if there is a quicker way to do this. Use for loop?? what I want to see is something like this: City? ? ? ? Coefficient? ? ? Adjusted R square 1? ? ? ? ? ? ? -0.05? ? ? ? ? ? ? ? ? 0.36 2? ? ? ? ? ? ? -0.12? ? ? ? ? ? ? ? ? 0.20 3? ? ? ? ? ? ? -0.05? ? ? ? ? ? ? ? ? 0.32 ..... Any advice is appreciated! Gary ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Nov 25, 2013, at 3:35 PM, Gary Dong wrote:> Dear R users, > > I have a large data set which includes data from 300 cities. I want to run > a biviriate regression for each city and record the coefficient and the > adjusted R square. > > For example, in the following, I have 10 cities represented by numbers from > 1 to 10: > > x = cumsum(c(0, runif(999, -1, +1))) > y = cumsum(c(0, runif(999, -1, +1))) > city = rep(1:10,each=100) > data<-data.frame(cbind(x,y,city)) > > I can manually run regressions for each city: > fit_city1 <- lm(y ~ x,data=subset(data,data$city==1)) > summary(fit_city1) > > Obvious, it is very tedious to run 300 regressions. I wonder if there is a > quicker way to do this. Use for loop? what I want to see is something like > this: > > City Coefficient Adjusted R square > 1 -0.05 0.36 > 2 -0.12 0.20 > 3 -0.05 0.32 > ..... >The way to get the most rapid response from this list is to post a dataset that represents the complexity of the problem. Presumably this large dataset is either a dataframe with a column of city entries or a list of dataframes. Why not post dput() applied to an extract of three of the cities and include sufficient rows to allow a regression?> > [[alternative HTML version deleted]] >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.This is a plain text list. -- David Winsemius Alameda, CA, USA
Hi It is work for split/lapply or sapply approach. ff<-function(data) {ss<-lm(y~x, data); c(coef(ss), summary(ss)$adj.r.squared)} lapply(split(data[,1:2], data$city), ff) Regards Petr> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Gary Dong > Sent: Tuesday, November 26, 2013 12:36 AM > To: r-help at r-project.org > Subject: [R] summary many regressions > > Dear R users, > > I have a large data set which includes data from 300 cities. I want to > run a biviriate regression for each city and record the coefficient and > the adjusted R square. > > For example, in the following, I have 10 cities represented by numbers > from > 1 to 10: > > x = cumsum(c(0, runif(999, -1, +1))) > y = cumsum(c(0, runif(999, -1, +1))) > city = rep(1:10,each=100) > data<-data.frame(cbind(x,y,city)) > > I can manually run regressions for each city: > fit_city1 <- lm(y ~ x,data=subset(data,data$city==1)) > summary(fit_city1) > > Obvious, it is very tedious to run 300 regressions. I wonder if there > is a quicker way to do this. Use for loop? what I want to see is > something like > this: > > City Coefficient Adjusted R square > 1 -0.05 0.36 > 2 -0.12 0.20 > 3 -0.05 0.32 > ..... > > Any advice is appreciated! > > Gary > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.