CHEN, Cheng
2013-May-19 12:31 UTC
[R] How to run lm for each subset of the data frame, and then aggreage the result?
Hi gurus, I have a big data frame df, with columns named as : age, income, country what I want to do is very simpe actually, do fitFunc<-function(thisCountry){ subframe<-df[which(country==thisCountry),]; fit<-lm(income~0+age, data=subframe); return(coef(fit));} for each individual country. Then aggregate the result into a new data frame looks like : countryname, coeffname1 USA 1.22 GB 1.03 France 1.1 I tried to do : do.call("rbind", lapply(countries, fitFunc)) but this only gives something like: age [1,] 2.540879 [2,] 2.428830 [3,] 2.369560 How should I proceed? can anyone help? -- *CHEN*, Cheng [[alternative HTML version deleted]]
arun
2013-May-19 16:10 UTC
[R] How to run lm for each subset of the data frame, and then aggreage the result?
HI, May be this helps: set.seed(24) dat1<- data.frame(age=sample(30:70,120,replace=TRUE),income=sample(40000:80000,120,replace=FALSE),country=rep(c("USA","GB","France"),each=40),stringsAsFactors=FALSE) library(plyr) ?ldply(dlply(dat1,.(country),lm,formula=income~0+age),function(x) coef(x)) #? country????? age #1? France 1127.192 #2????? GB 1194.586 #3???? USA 1161.795 #or do.call(rbind,lapply(split(dat1,dat1$country),function(x) coef(with(x,lm(income~0+age))))) #??????????? age #France 1127.192 #GB???? 1194.586 #USA??? 1161.795 #or ?do.call(rbind,lapply(unique(dat1$country),function(x) {subframe<- dat1[which(dat1$country==x),]; fit<- lm(income~0+age,data=subframe); Coef1<-data.frame(age=coef(fit)); row.names(Coef1)<-x; Coef1})) #??????????? age #USA??? 1161.795 #GB???? 1194.586 #France 1127.192 A.K. ----- Original Message ----- From: "CHEN, Cheng" <chencheng at gmail.com> To: R-help at r-project.org Cc: Sent: Sunday, May 19, 2013 8:31 AM Subject: [R] How to run lm for each subset of the data frame, and then aggreage the result? Hi gurus, I have a big data frame df, with columns named as : age, income, country what I want to do is very simpe actually, do fitFunc<-function(thisCountry){ ? ? subframe<-df[which(country==thisCountry),]; ? ? fit<-lm(income~0+age, data=subframe); ? ? return(coef(fit));} for each individual country. Then aggregate the result into a new data frame looks like : ? ? countryname,? coeffname1? ? ? USA? ? ? ? 1.22? ? ? GB 1.03? ? ? France? ? ? 1.1 I tried to do : do.call("rbind", lapply(countries, fitFunc)) but this only gives something like: ? ? ? ? ? age [1,] 2.540879 [2,] 2.428830 [3,] 2.369560 How should I proceed? can anyone help? -- *CHEN*, Cheng ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius
2013-May-19 16:19 UTC
[R] How to run lm for each subset of the data frame, and then aggreage the result?
On May 19, 2013, at 5:31 AM, CHEN, Cheng wrote:> Hi gurus, > > I have a big data frame df, with columns named as : > > age, income, country > > what I want to do is very simpe actually, do > > fitFunc<-function(thisCountry){ > subframe<-df[which(country==thisCountry),]; > fit<-lm(income~0+age, data=subframe); > return(coef(fit));} > > for each individual country. Then aggregate the result into a new data > frame looks like : > > countryname, coeffname1 USA 1.22 GB > 1.03 France 1.1 > > I tried to do : > do.call("rbind", lapply(countries, fitFunc)) >This suggests you have used 'attach' on df. Not a safe practice.> but this only gives something like: > > age > [1,] 2.540879 > [2,] 2.428830 > [3,] 2.369560 > How should I proceed?That is exactly the sort of result I would have expected from your procedure. We cannot tell what you want that is different. For one thing you are posting in HTML so the "aggregate result above is mangled. I'm guessing it might have been. countryname, coeffname1 USA 1.22 GB 1.03 France 1.1 So perhaps the only thing that is missing are the row names? res <- do.call("rbind", lapply(df$countries, fitFunc) rownames(res) <- as.character(df$countries) res If you had wanted a dataframe to be returned you could do this with the 'by' function or return a list with countries instead of a numeric vector from your 'fitFunc' calls. rbind a list of lists may give you something that should easily be coerced to data.frame. (But no data to test these theories)> > [[alternative HTML version deleted]]^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^> and provide commented, minimal, self-contained, reproducible code.^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -- David Winsemius Alameda, CA, USA