thr3ads.net - R help - [R] Improving data processing efficiency [Jun 2008]

If this information is useful, please help other people find it:
Share via:

Daniel Folkinshteyn

2008-Jun-05 18:09 UTC

[R] Improving data processing efficiency

Hi everyone!

I have a question about data processing efficiency.

My data are as follows: I have a data set on quarterly institutional 
ownership of equities; some of them have had recent IPOs, some have not 
(I have a binary flag set). The total dataset size is 700k+ rows.

My goal is this: For every quarter since issue for each IPO, I need to 
find a "matched" firm in the same industry, and close in market cap.
So,
e.g., for firm X, which had an IPO, i need to find a matched non-issuing 
firm in quarter 1 since IPO, then a (possibly different) non-issuing 
firm in quarter 2 since IPO, etc. Repeat for each issuing firm (there 
are about 8300 of these).

Thus it seems to me that I need to be doing a lot of data selection and 
subsetting, and looping (yikes!), but the result appears to be highly 
inefficient and takes ages (well, many hours). What I am doing, in 
pseudocode, is this:

1. for each quarter of data, getting out all the IPOs and all the 
eligible non-issuing firms.
2. for each IPO in a quarter, grab all the non-issuers in the same 
industry, sort them by size, and finally grab a matching firm closest in 
size (the exact procedure is to grab the closest bigger firm if one 
exists, and just the biggest available if all are smaller)
3. assign the matched firm-observation the same "quarters since issue"
as the IPO being matched
4. rbind them all into the "matching" dataset.

The function I currently have is pasted below, for your reference. Is 
there any way to make it produce the same result but much faster? 
Specifically, I am guessing eliminating some loops would be very good, 
but I don't see how, since I need to do some fancy footwork for each IPO 
in each quarter to find the matching firm. I'll be doing a few things 
similar to this, so it's somewhat important to up the efficiency of 
this. Maybe some of you R-fu masters can clue me in? :)

I would appreciate any help, tips, tricks, tweaks, you name it! :)

========== my function below ==========
fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata, 
quarters_since_issue=40) {

     result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is 
cheaper, so typecast the result to matrix

     colnames = names(tfdata)

     quarterends = sort(unique(tfdata$DATE))

     for (aquarter in quarterends) {
         tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]

         tfdata_quarter_fitting_nonissuers = tfdata_quarter[ 
(tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue) & 
(tfdata_quarter$IPO.Flag == 0), ]
         tfdata_quarter_ipoissuers = tfdata_quarter[ 
tfdata_quarter$IPO.Flag == 1, ]

         for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
             arow = tfdata_quarter_ipoissuers[i,]
             industrypeers = tfdata_quarter_fitting_nonissuers[ 
tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
             industrypeers = industrypeers[ 
order(industrypeers$Market.Cap.13f), ]
             if ( nrow(industrypeers) > 0 ) {
                 if ( nrow(industrypeers[industrypeers$Market.Cap.13f >= 
arow$Market.Cap.13f, ]) > 0 ) {
                     bestpeer = 
industrypeers[industrypeers$Market.Cap.13f >= arow$Market.Cap.13f, ][1,]
                 }
                 else {
                     bestpeer = industrypeers[nrow(industrypeers),]
                 }
                 bestpeer$Quarters.Since.IPO.Issue = 
arow$Quarters.Since.IPO.Issue
 
#tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO == 
bestpeer$PERMNO] = 1
                 result = rbind(result, as.matrix(bestpeer))
             }
         }
         #result = rbind(result, tfdata_quarter)
         print (aquarter)
     }

     result = as.data.frame(result)
     names(result) = colnames
     return(result)

}

========= end of my function =============

bartjoosen

2008-Jun-05 19:43 UTC

head link

[R] Improving data processing efficiency

Maybe you should provide a minimal, working code with data, so that we all
can give it a try.
In the mean time: take a look at the Rprof function to see where your code
can be improved.

Good luck

Bart


Daniel Folkinshteyn-2 wrote:> 
> Hi everyone!
> 
> I have a question about data processing efficiency.
> 
> My data are as follows: I have a data set on quarterly institutional 
> ownership of equities; some of them have had recent IPOs, some have not 
> (I have a binary flag set). The total dataset size is 700k+ rows.
> 
> My goal is this: For every quarter since issue for each IPO, I need to 
> find a "matched" firm in the same industry, and close in market
cap. So,
> e.g., for firm X, which had an IPO, i need to find a matched non-issuing 
> firm in quarter 1 since IPO, then a (possibly different) non-issuing 
> firm in quarter 2 since IPO, etc. Repeat for each issuing firm (there 
> are about 8300 of these).
> 
> Thus it seems to me that I need to be doing a lot of data selection and 
> subsetting, and looping (yikes!), but the result appears to be highly 
> inefficient and takes ages (well, many hours). What I am doing, in 
> pseudocode, is this:
> 
> 1. for each quarter of data, getting out all the IPOs and all the 
> eligible non-issuing firms.
> 2. for each IPO in a quarter, grab all the non-issuers in the same 
> industry, sort them by size, and finally grab a matching firm closest in 
> size (the exact procedure is to grab the closest bigger firm if one 
> exists, and just the biggest available if all are smaller)
> 3. assign the matched firm-observation the same "quarters since
issue"
> as the IPO being matched
> 4. rbind them all into the "matching" dataset.
> 
> The function I currently have is pasted below, for your reference. Is 
> there any way to make it produce the same result but much faster? 
> Specifically, I am guessing eliminating some loops would be very good, 
> but I don't see how, since I need to do some fancy footwork for each
IPO
> in each quarter to find the matching firm. I'll be doing a few things 
> similar to this, so it's somewhat important to up the efficiency of 
> this. Maybe some of you R-fu masters can clue me in? :)
> 
> I would appreciate any help, tips, tricks, tweaks, you name it! :)
> 
> ========== my function below ==========> 
> fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata, 
> quarters_since_issue=40) {
> 
>      result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is 
> cheaper, so typecast the result to matrix
> 
>      colnames = names(tfdata)
> 
>      quarterends = sort(unique(tfdata$DATE))
> 
>      for (aquarter in quarterends) {
>          tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
> 
>          tfdata_quarter_fitting_nonissuers = tfdata_quarter[ 
> (tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue)
&
> (tfdata_quarter$IPO.Flag == 0), ]
>          tfdata_quarter_ipoissuers = tfdata_quarter[ 
> tfdata_quarter$IPO.Flag == 1, ]
> 
>          for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>              arow = tfdata_quarter_ipoissuers[i,]
>              industrypeers = tfdata_quarter_fitting_nonissuers[ 
> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
>              industrypeers = industrypeers[ 
> order(industrypeers$Market.Cap.13f), ]
>              if ( nrow(industrypeers) > 0 ) {
>                  if ( nrow(industrypeers[industrypeers$Market.Cap.13f >=
> arow$Market.Cap.13f, ]) > 0 ) {
>                      bestpeer = 
> industrypeers[industrypeers$Market.Cap.13f >= arow$Market.Cap.13f, ][1,]
>                  }
>                  else {
>                      bestpeer = industrypeers[nrow(industrypeers),]
>                  }
>                  bestpeer$Quarters.Since.IPO.Issue = 
> arow$Quarters.Since.IPO.Issue
>  
> #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO == 
> bestpeer$PERMNO] = 1
>                  result = rbind(result, as.matrix(bestpeer))
>              }
>          }
>          #result = rbind(result, tfdata_quarter)
>          print (aquarter)
>      }
> 
>      result = as.data.frame(result)
>      names(result) = colnames
>      return(result)
> 
> }
> 
> ========= end of my function ============> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
-- 
View this message in context:
http://www.nabble.com/Improving-data-processing-efficiency-tp17676300p17678034.html
Sent from the R help mailing list archive at Nabble.com.

Daniel Folkinshteyn

2008-Jun-06 15:12 UTC

head link

[R] Improving data processing efficiency

Anybody have any thoughts on this? Please? :)

on 06/05/2008 02:09 PM Daniel Folkinshteyn said the
following:> Hi everyone!
> 
> I have a question about data processing efficiency.
> 
> My data are as follows: I have a data set on quarterly institutional 
> ownership of equities; some of them have had recent IPOs, some have not 
> (I have a binary flag set). The total dataset size is 700k+ rows.
> 
> My goal is this: For every quarter since issue for each IPO, I need to 
> find a "matched" firm in the same industry, and close in market
cap. So,
> e.g., for firm X, which had an IPO, i need to find a matched non-issuing 
> firm in quarter 1 since IPO, then a (possibly different) non-issuing 
> firm in quarter 2 since IPO, etc. Repeat for each issuing firm (there 
> are about 8300 of these).
> 
> Thus it seems to me that I need to be doing a lot of data selection and 
> subsetting, and looping (yikes!), but the result appears to be highly 
> inefficient and takes ages (well, many hours). What I am doing, in 
> pseudocode, is this:
> 
> 1. for each quarter of data, getting out all the IPOs and all the 
> eligible non-issuing firms.
> 2. for each IPO in a quarter, grab all the non-issuers in the same 
> industry, sort them by size, and finally grab a matching firm closest in 
> size (the exact procedure is to grab the closest bigger firm if one 
> exists, and just the biggest available if all are smaller)
> 3. assign the matched firm-observation the same "quarters since
issue"
> as the IPO being matched
> 4. rbind them all into the "matching" dataset.
> 
> The function I currently have is pasted below, for your reference. Is 
> there any way to make it produce the same result but much faster? 
> Specifically, I am guessing eliminating some loops would be very good, 
> but I don't see how, since I need to do some fancy footwork for each
IPO
> in each quarter to find the matching firm. I'll be doing a few things 
> similar to this, so it's somewhat important to up the efficiency of 
> this. Maybe some of you R-fu masters can clue me in? :)
> 
> I would appreciate any help, tips, tricks, tweaks, you name it! :)
> 
> ========== my function below ==========> 
> fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata, 
> quarters_since_issue=40) {
> 
>     result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is 
> cheaper, so typecast the result to matrix
> 
>     colnames = names(tfdata)
> 
>     quarterends = sort(unique(tfdata$DATE))
> 
>     for (aquarter in quarterends) {
>         tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
> 
>         tfdata_quarter_fitting_nonissuers = tfdata_quarter[ 
> (tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue)
&
> (tfdata_quarter$IPO.Flag == 0), ]
>         tfdata_quarter_ipoissuers = tfdata_quarter[ 
> tfdata_quarter$IPO.Flag == 1, ]
> 
>         for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>             arow = tfdata_quarter_ipoissuers[i,]
>             industrypeers = tfdata_quarter_fitting_nonissuers[ 
> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
>             industrypeers = industrypeers[ 
> order(industrypeers$Market.Cap.13f), ]
>             if ( nrow(industrypeers) > 0 ) {
>                 if ( nrow(industrypeers[industrypeers$Market.Cap.13f >= 
> arow$Market.Cap.13f, ]) > 0 ) {
>                     bestpeer = 
> industrypeers[industrypeers$Market.Cap.13f >= arow$Market.Cap.13f, ][1,]
>                 }
>                 else {
>                     bestpeer = industrypeers[nrow(industrypeers),]
>                 }
>                 bestpeer$Quarters.Since.IPO.Issue = 
> arow$Quarters.Since.IPO.Issue
> 
> #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO == 
> bestpeer$PERMNO] = 1
>                 result = rbind(result, as.matrix(bestpeer))
>             }
>         }
>         #result = rbind(result, tfdata_quarter)
>         print (aquarter)
>     }
> 
>     result = as.data.frame(result)
>     names(result) = colnames
>     return(result)
> 
> }
> 
> ========= end of my function ============>

Gabor Grothendieck

2008-Jun-06 18:32 UTC

head link

[R] Improving data processing efficiency

On Fri, Jun 6, 2008 at 2:28 PM, Greg Snow <Greg.Snow at imail.org>
wrote:>> -----Original Message-----
>> From: r-help-bounces at r-project.org
>> [mailto:r-help-bounces at r-project.org] On Behalf Of Patrick Burns
>> Sent: Friday, June 06, 2008 12:04 PM
>> To: Daniel Folkinshteyn
>> Cc: r-help at r-project.org
>> Subject: Re: [R] Improving data processing efficiency
>>
>> That is going to be situation dependent, but if you have a
>> reasonable upper bound, then that will be much easier and not
>> far from optimal.
>>
>> If you pick the possibly too small route, then increasing the
>> size in largish junks is much better than adding a row at a time.
>
> Pat,
>
> I am unfamiliar with the use of the word "junk" as a unit of
measure for data objects.  I figure there are a few different possibilities:
>
> 1. You are using the term intentionally meaning that you suggest he
increases the size in terms of old cars and broken pianos rather than used up
pens and broken pencils.
>
> 2. This was a Freudian slip based on your opinion of some datasets you have
seen.
>
> 3. Somewhere between your mind and the final product
"jumps/chunks" became "junks" (possibly a microsoft
"correction", or just typing too fast combined with number 2).
>
> 4. "junks" is an official measure of data/object size that I need
to learn more about (the history of the term possibly being related to 2 and 3
above).
>
5. Chinese sailing vessel.
http://en.wikipedia.org/wiki/Junk_(ship)

Patrick Burns

2008-Jun-06 18:58 UTC

head link

[R] Improving data processing efficiency

My guess is that number 2 is closest to the mark.
Typing too fast is unfortunately not one of my
habitual attributes.

Gabor Grothendieck wrote:> On Fri, Jun 6, 2008 at 2:28 PM, Greg Snow <Greg.Snow at imail.org>
wrote:
>   
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org
>>> [mailto:r-help-bounces at r-project.org] On Behalf Of Patrick Burns
>>> Sent: Friday, June 06, 2008 12:04 PM
>>> To: Daniel Folkinshteyn
>>> Cc: r-help at r-project.org
>>> Subject: Re: [R] Improving data processing efficiency
>>>
>>> That is going to be situation dependent, but if you have a
>>> reasonable upper bound, then that will be much easier and not
>>> far from optimal.
>>>
>>> If you pick the possibly too small route, then increasing the
>>> size in largish junks is much better than adding a row at a time.
>>>       
>> Pat,
>>
>> I am unfamiliar with the use of the word "junk" as a unit of
measure for data objects.  I figure there are a few different possibilities:
>>
>> 1. You are using the term intentionally meaning that you suggest he
increases the size in terms of old cars and broken pianos rather than used up
pens and broken pencils.
>>
>> 2. This was a Freudian slip based on your opinion of some datasets you
have seen.
>>
>> 3. Somewhere between your mind and the final product
"jumps/chunks" became "junks" (possibly a microsoft
"correction", or just typing too fast combined with number 2).
>>
>> 4. "junks" is an official measure of data/object size that I
need to learn more about (the history of the term possibly being related to 2
and 3 above).
>>
>>     
>
> 5. Chinese sailing vessel.
> http://en.wikipedia.org/wiki/Junk_(ship)
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>

Greg Snow

2008-Jun-06 19:14 UTC

head link

[R] New vocabulary on a Friday afternoon. Was: Improving data processing efficiency

I still like the number 4 option, so I think we need to come up with a formal
definition for a "junk" of data.  I read somewhere that Tukey coined
the word "bit" as it applies to computers, we can share the
credit/blame for "junks" of data.

My proposal for a statistical/data definition of the work junk:

Junk (noun):
A quantity of data just large enough to get the client excited about the
"great" dataset they provided, but not large enough to make any useful
conclusions.

Example sentence:  We just received another junk of data from the boss, who gets
to give him the bad news that it still does not prove his pet theory?


--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
(801) 408-8111


> -----Original Message-----
> From: Patrick Burns [mailto:pburns at pburns.seanet.com]
> Sent: Friday, June 06, 2008 12:58 PM
> To: Gabor Grothendieck
> Cc: Greg Snow; r-help at r-project.org
> Subject: Re: [R] Improving data processing efficiency
>
> My guess is that number 2 is closest to the mark.
> Typing too fast is unfortunately not one of my habitual attributes.
>
> Gabor Grothendieck wrote:
> > On Fri, Jun 6, 2008 at 2:28 PM, Greg Snow
> <Greg.Snow at imail.org> wrote:
> >
> >>> -----Original Message-----
> >>> From: r-help-bounces at r-project.org
> >>> [mailto:r-help-bounces at r-project.org] On Behalf Of Patrick
Burns
> >>> Sent: Friday, June 06, 2008 12:04 PM
> >>> To: Daniel Folkinshteyn
> >>> Cc: r-help at r-project.org
> >>> Subject: Re: [R] Improving data processing efficiency
> >>>
> >>> That is going to be situation dependent, but if you have a
> >>> reasonable upper bound, then that will be much easier and not
far
> >>> from optimal.
> >>>
> >>> If you pick the possibly too small route, then increasing
> the size
> >>> in largish junks is much better than adding a row at a time.
> >>>
> >> Pat,
> >>
> >> I am unfamiliar with the use of the word "junk" as a
unit
> of measure for data objects.  I figure there are a few
> different possibilities:
> >>
> >> 1. You are using the term intentionally meaning that you
> suggest he increases the size in terms of old cars and broken
> pianos rather than used up pens and broken pencils.
> >>
> >> 2. This was a Freudian slip based on your opinion of some
> datasets you have seen.
> >>
> >> 3. Somewhere between your mind and the final product
> "jumps/chunks" became "junks" (possibly a microsoft
> "correction", or just typing too fast combined with number 2).
> >>
> >> 4. "junks" is an official measure of data/object size
that
> I need to learn more about (the history of the term possibly
> being related to 2 and 3 above).
> >>
> >>
> >
> > 5. Chinese sailing vessel.
> > http://en.wikipedia.org/wiki/Junk_(ship)
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
>

Don MacQueen

2008-Jun-06 22:48 UTC

head link

[R] Improving data processing efficiency

In a case like this, if you can possibly work with matrices instead 
of data frames, you might get significant speedup.
(More accurately, I have had situations where I obtained speed up by 
working with matrices instead of dataframes.)
Even if you have to code character columns as numeric, it can be worth it.

Data frames have overhead that matrices do not. (Here's where 
profiling might have given a clue) Granted, there has been recent 
work in reducing the overhead associated with dataframes, but I think 
it's worth a try. Carrying along extra columns and doing row 
subsetting, rbinding, etc, means a lot more things happening in 
memory.

So, for example, if all of your matching is based just on a few 
columns, extract those columns, convert them to a matrix, do all the 
matching, and then based on some sort of row index retrieve all of 
the associated columns.

-Don

At 2:09 PM -0400 6/5/08, Daniel Folkinshteyn wrote:>Hi everyone!
>
>I have a question about data processing efficiency.
>
>My data are as follows: I have a data set on quarterly institutional 
>ownership of equities; some of them have had recent IPOs, some have 
>not (I have a binary flag set). The total dataset size is 700k+ rows.
>
>My goal is this: For every quarter since issue for each IPO, I need 
>to find a "matched" firm in the same industry, and close in market
>cap. So, e.g., for firm X, which had an IPO, i need to find a 
>matched non-issuing firm in quarter 1 since IPO, then a (possibly 
>different) non-issuing firm in quarter 2 since IPO, etc. Repeat for 
>each issuing firm (there are about 8300 of these).
>
>Thus it seems to me that I need to be doing a lot of data selection 
>and subsetting, and looping (yikes!), but the result appears to be 
>highly inefficient and takes ages (well, many hours). What I am 
>doing, in pseudocode, is this:
>
>1. for each quarter of data, getting out all the IPOs and all the 
>eligible non-issuing firms.
>2. for each IPO in a quarter, grab all the non-issuers in the same 
>industry, sort them by size, and finally grab a matching firm 
>closest in size (the exact procedure is to grab the closest bigger 
>firm if one exists, and just the biggest available if all are 
>smaller)
>3. assign the matched firm-observation the same "quarters since 
>issue" as the IPO being matched
>4. rbind them all into the "matching" dataset.
>
>The function I currently have is pasted below, for your reference. 
>Is there any way to make it produce the same result but much faster? 
>Specifically, I am guessing eliminating some loops would be very 
>good, but I don't see how, since I need to do some fancy footwork 
>for each IPO in each quarter to find the matching firm. I'll be 
>doing a few things similar to this, so it's somewhat important to up 
>the efficiency of this. Maybe some of you R-fu masters can clue me 
>in? :)
>
>I would appreciate any help, tips, tricks, tweaks, you name it! :)
>
>========== my function below ==========>
>fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata, 
>quarters_since_issue=40) {
>
>     result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is 
>cheaper, so typecast the result to matrix
>
>     colnames = names(tfdata)
>
>     quarterends = sort(unique(tfdata$DATE))
>
>     for (aquarter in quarterends) {
>         tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>
>         tfdata_quarter_fitting_nonissuers = tfdata_quarter[ 
>(tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue) 
>& (tfdata_quarter$IPO.Flag == 0), ]
>         tfdata_quarter_ipoissuers = tfdata_quarter[ 
>tfdata_quarter$IPO.Flag == 1, ]
>
>         for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>             arow = tfdata_quarter_ipoissuers[i,]
>             industrypeers = tfdata_quarter_fitting_nonissuers[ 
>tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
>             industrypeers = industrypeers[ 
>order(industrypeers$Market.Cap.13f), ]
>             if ( nrow(industrypeers) > 0 ) {
>                 if ( 
>nrow(industrypeers[industrypeers$Market.Cap.13f >= 
>arow$Market.Cap.13f, ]) > 0 ) {
>                     bestpeer = 
>industrypeers[industrypeers$Market.Cap.13f >= arow$Market.Cap.13f, 
>][1,]
>                 }
>                 else {
>                     bestpeer = industrypeers[nrow(industrypeers),]
>                 }
>                 bestpeer$Quarters.Since.IPO.Issue = 
>arow$Quarters.Since.IPO.Issue
>
>#tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO == 
>bestpeer$PERMNO] = 1
>                 result = rbind(result, as.matrix(bestpeer))
>             }
>         }
>         #result = rbind(result, tfdata_quarter)
>         print (aquarter)
>     }
>
>     result = as.data.frame(result)
>     names(result) = colnames
>     return(result)
>
>}
>
>========= end of my function ============>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
--------------------------------------
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062

Esmail Bonakdarian

2008-Jun-07 00:09 UTC

head link

[R] Improving data processing efficiency

Esmail Bonakdarian wrote:> hadley wickham wrote:
>>
> 
> Hi,
> 
> I tried this suggestion as I am curious about bottlenecks in my own
> R code ...
> 
>> Why not try profiling?  The profr package provides an alternative
>> display that I find more helpful than the default tools:
>>
>> install.packages("profr")
> 
>  > install.packages("profr")
> Warning message:
> package ?profr? is not available
I selected a different mirror in place of the Iowa one and it
worked. Odd, I just assumed all the same packages are available
on all mirrors.

Maybe Matching Threads

Search for more possibly parallel threads

R help - Jun 2008 - Improving data processing efficiency

[R] Improving data processing efficiency

[R] Improving data processing efficiency

[R] Improving data processing efficiency

[R] Improving data processing efficiency

[R] Improving data processing efficiency

[R] New vocabulary on a Friday afternoon. Was: Improving data processing efficiency

[R] Improving data processing efficiency

[R] Improving data processing efficiency

Maybe Matching Threads