I have two data sets about lots of companies' stock and fiscal data. One is monthly data with about 144,000 lines, and the other is quaterly with about 56,000. Each data set takes different company code. I need to merge these two together. I read both ask cvs. And the other file with corresponding firm code. Now I have three data sets. return$PERMNO, account$GVKEY. id is the data frames of the corresponding relation and has both id$PERMNO and id$GVKEY. Also, I need to convert the return's month into quarter and finally merge two data frames(return and account). I end up write a short program for this, but it runs very slow. 15+ minutes. Is there quick way to do it. Here is my original codes. id$fy=rep(0,length(id$PERMNO)) for (i in 1:length(id$PERMNO)) id$fy[[i]]<-account$FYR[id$GVKEY[[i]]==account$GVKEY][[1]] return$GVKEY=rep(0,length(return$PERMNO)) return$fyy=rep(0,length(return$PERMNO)) return$fyq=rep(0,length(return$PERMNO)) for (i in i:length(return$PERMNO)) { temp<-id$PERMNO==return$PERMNO[[i]]; tempmon<-id$fy[temp][[1]]; if (return$month[[i]]<-tempmon) { return$fyy[[i]]<-return$year[[i]]; return$fyq[[i]]<-4-(tempmon-return$month[[i]])%/%3; } else{ return$fyy[[i]]<-return$year[[i]]+1; return$fyq[[i]]<-(return$month[[i]]-tempmon-1)%/%3; } return$GVKEY[[i]]<-id$GVKEY[temp][[1]]; } returnnew=merge(return,account,by.x<-c("GVKEY","fyy","fyq"),by.y<-c("GVKEY","fyy","fyq"))
Some functions that may be of help: ?aggregate.ts ?cbind ?merge and in the zoo package ?as.yearmon ?as.yearqtr ?aggregate.zoo ?merge.zoo On 5/1/06, Guojun Zhu <shmilylemon at yahoo.com> wrote:> I have two data sets about lots of companies' stock > and fiscal data. One is monthly data with about > 144,000 lines, and the other is quaterly with about > 56,000. Each data set takes different company code. > I need to merge these two together. I read both ask > cvs. And the other file with corresponding firm code. > Now I have three data sets. return$PERMNO, > account$GVKEY. id is the data frames of the > corresponding relation and has both id$PERMNO and > id$GVKEY. Also, I need to convert the return's month > into quarter and finally merge two data frames(return > and account). I end up write a short program for > this, but it runs very slow. 15+ minutes. Is there > quick way to do it. Here is my original codes. > > > > id$fy=rep(0,length(id$PERMNO)) > for (i in 1:length(id$PERMNO)) > > id$fy[[i]]<-account$FYR[id$GVKEY[[i]]==account$GVKEY][[1]] > > return$GVKEY=rep(0,length(return$PERMNO)) > return$fyy=rep(0,length(return$PERMNO)) > return$fyq=rep(0,length(return$PERMNO)) > for (i in i:length(return$PERMNO)) { > temp<-id$PERMNO==return$PERMNO[[i]]; > tempmon<-id$fy[temp][[1]]; > if (return$month[[i]]<-tempmon) { > return$fyy[[i]]<-return$year[[i]]; > return$fyq[[i]]<-4-(tempmon-return$month[[i]])%/%3; > } > else{ > return$fyy[[i]]<-return$year[[i]]+1; > return$fyq[[i]]<-(return$month[[i]]-tempmon-1)%/%3; > } > return$GVKEY[[i]]<-id$GVKEY[temp][[1]]; > } > > returnnew=merge(return,account,by.x<-c("GVKEY","fyy","fyq"),by.y<-c("GVKEY","fyy","fyq")) > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
I'm sure you'll get ingenious responses to help you optimize your R code. I deal with similar investment data in even larger numbers (e.g. 10 years of daily return data for each stock in the Russell 3000), and prefer reading and consolidating the data in Python using dictionaries and lists, then either piping the data to R in a read statement (read.table("pipe python...")) or using Rpy to write R data frames directly from Python. Python is more facile with these basic data manipulations for hundreds of thousands or even millions of records, and performance is generally considerably better. Steve Miller -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Guojun Zhu Sent: Monday, May 01, 2006 2:35 AM To: r-help at stat.math.ethz.ch Subject: [R] efficiency in merging two data frames I have two data sets about lots of companies' stock and fiscal data. One is monthly data with about 144,000 lines, and the other is quaterly with about 56,000. Each data set takes different company code. I need to merge these two together. I read both ask cvs. And the other file with corresponding firm code. Now I have three data sets. return$PERMNO, account$GVKEY. id is the data frames of the corresponding relation and has both id$PERMNO and id$GVKEY. Also, I need to convert the return's month into quarter and finally merge two data frames(return and account). I end up write a short program for this, but it runs very slow. 15+ minutes. Is there quick way to do it. Here is my original codes. id$fy=rep(0,length(id$PERMNO)) for (i in 1:length(id$PERMNO)) id$fy[[i]]<-account$FYR[id$GVKEY[[i]]==account$GVKEY][[1]] return$GVKEY=rep(0,length(return$PERMNO)) return$fyy=rep(0,length(return$PERMNO)) return$fyq=rep(0,length(return$PERMNO)) for (i in i:length(return$PERMNO)) { temp<-id$PERMNO==return$PERMNO[[i]]; tempmon<-id$fy[temp][[1]]; if (return$month[[i]]<-tempmon) { return$fyy[[i]]<-return$year[[i]]; return$fyq[[i]]<-4-(tempmon-return$month[[i]])%/%3; } else{ return$fyy[[i]]<-return$year[[i]]+1; return$fyq[[i]]<-(return$month[[i]]-tempmon-1)%/%3; } return$GVKEY[[i]]<-id$GVKEY[temp][[1]]; } returnnew=merge(return,account,by.x<-c("GVKEY","fyy","fyq"),by.y<-c("GVKEY", "fyy","fyq")) ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Another good option is SQL, the fastest and most scalable solution. If you decide to give it a try pay close attention to indexes.> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Steve Miller > Sent: Monday, May 01, 2006 8:55 AM > To: 'Guojun Zhu'; r-help at stat.math.ethz.ch > Subject: Re: [R] efficiency in merging two data frames > > I'm sure you'll get ingenious responses to help you optimize > your R code. I > deal with similar investment data in even larger numbers > (e.g. 10 years of > daily return data for each stock in the Russell 3000), and > prefer reading > and consolidating the data in Python using dictionaries and > lists, then > either piping the data to R in a read statement (read.table("pipe > python...")) or using Rpy to write R data frames directly from Python. > Python is more facile with these basic data manipulations for > hundreds of > thousands or even millions of records, and performance is generally > considerably better. > > Steve Miller > > -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Guojun Zhu > Sent: Monday, May 01, 2006 2:35 AM > To: r-help at stat.math.ethz.ch > Subject: [R] efficiency in merging two data frames > > I have two data sets about lots of companies' stock > and fiscal data. One is monthly data with about > 144,000 lines, and the other is quaterly with about > 56,000. Each data set takes different company code. > I need to merge these two together. I read both ask > cvs. And the other file with corresponding firm code. > Now I have three data sets. return$PERMNO, > account$GVKEY. id is the data frames of the > corresponding relation and has both id$PERMNO and > id$GVKEY. Also, I need to convert the return's month > into quarter and finally merge two data frames(return > and account). I end up write a short program for > this, but it runs very slow. 15+ minutes. Is there > quick way to do it. Here is my original codes. > > > > id$fy=rep(0,length(id$PERMNO)) > for (i in 1:length(id$PERMNO)) > > id$fy[[i]]<-account$FYR[id$GVKEY[[i]]==account$GVKEY][[1]] > > return$GVKEY=rep(0,length(return$PERMNO)) > return$fyy=rep(0,length(return$PERMNO)) > return$fyq=rep(0,length(return$PERMNO)) > for (i in i:length(return$PERMNO)) { > temp<-id$PERMNO==return$PERMNO[[i]]; > tempmon<-id$fy[temp][[1]]; > if (return$month[[i]]<-tempmon) { > return$fyy[[i]]<-return$year[[i]]; > return$fyq[[i]]<-4-(tempmon-return$month[[i]])%/%3; > } > else{ > return$fyy[[i]]<-return$year[[i]]+1; > return$fyq[[i]]<-(return$month[[i]]-tempmon-1)%/%3; > } > return$GVKEY[[i]]<-id$GVKEY[temp][[1]]; > } > > returnnew=merge(return,account,by.x<-c("GVKEY","fyy","fyq"),by > .y<-c("GVKEY", > "fyy","fyq")) > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >
Hi and you would better to give up thinking in "for cycles" as a good tool for data manipulation and calculation. R has usually functions for direct manipulation and calculation with vectors or matrices, vhich are far more effective than calculation in cycle. I also recommend you to go through Paul Johnsons Rtips which is set of many good ideas how to do some things. HTH Petr On 1 May 2006 at 7:54, Steve Miller wrote: From: "Steve Miller" <steve.miller at jhu.edu> To: "'Guojun Zhu'" <shmilylemon at yahoo.com>, <r-help at stat.math.ethz.ch> Date sent: Mon, 1 May 2006 07:54:54 -0500 Subject: Re: [R] efficiency in merging two data frames> I'm sure you'll get ingenious responses to help you optimize your R > code. I deal with similar investment data in even larger numbers (e.g. > 10 years of daily return data for each stock in the Russell 3000), and > prefer reading and consolidating the data in Python using dictionaries > and lists, then either piping the data to R in a read statement > (read.table("pipe python...")) or using Rpy to write R data frames > directly from Python. Python is more facile with these basic data > manipulations for hundreds of thousands or even millions of records, > and performance is generally considerably better. > > Steve Miller > > -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Guojun Zhu > Sent: Monday, May 01, 2006 2:35 AM To: r-help at stat.math.ethz.ch > Subject: [R] efficiency in merging two data frames > > I have two data sets about lots of companies' stock > and fiscal data. One is monthly data with about > 144,000 lines, and the other is quaterly with about > 56,000. Each data set takes different company code. > I need to merge these two together. I read both ask > cvs. And the other file with corresponding firm code. > Now I have three data sets. return$PERMNO, > account$GVKEY. id is the data frames of the > corresponding relation and has both id$PERMNO and > id$GVKEY. Also, I need to convert the return's month > into quarter and finally merge two data frames(return > and account). I end up write a short program for > this, but it runs very slow. 15+ minutes. Is there > quick way to do it. Here is my original codes. > > > > id$fy=rep(0,length(id$PERMNO)) > for (i in 1:length(id$PERMNO)) > > id$fy[[i]]<-account$FYR[id$GVKEY[[i]]==account$GVKEY][[1]] > > return$GVKEY=rep(0,length(return$PERMNO)) > return$fyy=rep(0,length(return$PERMNO)) > return$fyq=rep(0,length(return$PERMNO)) > for (i in i:length(return$PERMNO)) { > temp<-id$PERMNO==return$PERMNO[[i]]; > tempmon<-id$fy[temp][[1]]; > if (return$month[[i]]<-tempmon) { > return$fyy[[i]]<-return$year[[i]]; > return$fyq[[i]]<-4-(tempmon-return$month[[i]])%/%3; > } > else{ > return$fyy[[i]]<-return$year[[i]]+1; > return$fyq[[i]]<-(return$month[[i]]-tempmon-1)%/%3; > } > return$GVKEY[[i]]<-id$GVKEY[temp][[1]]; > } > > returnnew=merge(return,account,by.x<-c("GVKEY","fyy","fyq"),by.y<-c("G > VKEY", "fyy","fyq")) > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.htmlPetr Pikal petr.pikal at precheza.cz