Gregor Gorjanc
2006-Mar-15 21:39 UTC
[R] lapply vs. for (was: Incrementing a counter in lapply)
> From: Thomas Lumley >> >> On Tue, 14 Mar 2006, John McHenry wrote: >> >> > Thanks, Gabor & Thomas. >> > >> > Apologies, but I used an example that obfuscated the question that I >> > wanted to ask. >> > >> > I really wanted to know how to have extra arguments in >> functions that >> > would allow, per the example code, for something like a >> counter to be >> > incremented. Thomas's suggestion of using mapply >> (reproduced below with >> > corrections) is probably closest. >> >> It is probably worth pointing out here that the R >> documentation does not >> specify the order in which lapply() does the computation. >> >> If you could work out how to increment a counter (and you could, with >> sufficient effort), it would not necessarily work, because the 'i'th >> evaluation would not necessarily be of the 'i'th element. >> >> [lapply() does in fact start at the beginning, go on until it >> gets to the >> end, and then stop, but this isn't documented. Suppose R became >> multithreaded, for example....] > > The corollary, it seems to me, is that sometimes it's better to leave the > good old for loop alone. It's not always profitable to turn for loops into > some *apply construct. The trick is learning to know when to do it and when > not to.Can someone share some of this tricks with me? Up to now I have always done things with for loop. Just recently I started to pay attention to *apply* constructs and I already wanted to start implementing them instead of good old for, but then a stroke of lightning came from this thread. Based on words from Thomas, lapply should not be used for tasks where order is critical. Did I get this clear enough. Additionally, I have read notes (I lost link, but was posted on R-help, I think) from Thomas on R and he mentioned that it is commonly assumed that *apply* (I do not remember which one of *apply*) is faster than loop, but that this is not true. Any additional pointers to literature? -- Lep pozdrav / With regards, Gregor Gorjanc ---------------------------------------------------------------------- University of Ljubljana PhD student Biotechnical Faculty Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan Groblje 3 mail: gregor.gorjanc <at> bfro.uni-lj.si SI-1230 Domzale tel: +386 (0)1 72 17 861 Slovenia, Europe fax: +386 (0)1 72 17 888 ---------------------------------------------------------------------- "One must learn by doing the thing; for though you think you know it, you have no certainty until you try." Sophocles ~ 450 B.C.
Philippe Grosjean
2006-Mar-15 22:02 UTC
[R] lapply vs. for (was: Incrementing a counter in lapply)
the for() loop is very slow in S-PLUS. This is probably one of the motivation of developing the apply() family of functions (as well as the ugly For() loop) under this system. Now, for() loops are much faster in R. Also, if you look at the R code in apply(), you will realize that there is a for() loop in it! So, why would you prefer using apply() or the like? 1) If you write code to be run both in S-PLUS and R, 2) If you want more concise code (much "housekeeping" is done by apply() and co), 3) Because the apply() family is more in the phylosophy of vectorized calculation, that is, the favored approach in S language. Take care, however, that the optimal approach is not just to replace for() loops with apply() and co, but to *rethink* completelly your algorithm in a vectorized way. This often ends up with a very different solution! Best, Philippe Grosjean Gregor Gorjanc wrote:>>From: Thomas Lumley >> >>>On Tue, 14 Mar 2006, John McHenry wrote: >>> >>> >>>>Thanks, Gabor & Thomas. >>>> >>>>Apologies, but I used an example that obfuscated the question that I >>>>wanted to ask. >>>> >>>>I really wanted to know how to have extra arguments in >>> >>>functions that >>> >>>>would allow, per the example code, for something like a >>> >>>counter to be >>> >>>>incremented. Thomas's suggestion of using mapply >>> >>>(reproduced below with >>> >>>>corrections) is probably closest. >>> >>>It is probably worth pointing out here that the R >>>documentation does not >>>specify the order in which lapply() does the computation. >>> >>>If you could work out how to increment a counter (and you could, with >>>sufficient effort), it would not necessarily work, because the 'i'th >>>evaluation would not necessarily be of the 'i'th element. >>> >>>[lapply() does in fact start at the beginning, go on until it >>>gets to the >>>end, and then stop, but this isn't documented. Suppose R became >>>multithreaded, for example....] >> >>The corollary, it seems to me, is that sometimes it's better to leave the >>good old for loop alone. It's not always profitable to turn for loops into >>some *apply construct. The trick is learning to know when to do it and when >>not to. > > > Can someone share some of this tricks with me? Up to now I have always > done things with for loop. Just recently I started to pay attention to > *apply* constructs and I already wanted to start implementing them > instead of good old for, but then a stroke of lightning came from this > thread. Based on words from Thomas, lapply should not be used for tasks > where order is critical. Did I get this clear enough. Additionally, I > have read notes (I lost link, but was posted on R-help, I think) from > Thomas on R and he mentioned that it is commonly assumed that *apply* (I > do not remember which one of *apply*) is faster than loop, but that this > is not true. Any additional pointers to literature? >
Patrick Burns
2006-Mar-15 22:04 UTC
[R] lapply vs. for (was: Incrementing a counter in lapply)
In my opinion the main issue between using 'for' and an apply function is the simplicity of the code. If it is simpler and more understandable to use 'lapply' than a 'for' loop in a situation, then use 'lapply'. If in a different situation it is the 'for' loop that is simpler, then use the 'for' loop. In modern day R whatever timing differences there may be are likely to be slight, and virtually certain not to be critical. Where the confusion comes in is because in the olden days of S-PLUS, the timing differences could be quite substantial in some cases. The hangover from that is that apply functions are too often recommended in R. Patrick Burns patrick at burns-stat.com +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and "A Guide for the Unwilling S User") Gregor Gorjanc wrote:>>From: Thomas Lumley >> >> >>>On Tue, 14 Mar 2006, John McHenry wrote: >>> >>> >>> >>>>Thanks, Gabor & Thomas. >>>> >>>>Apologies, but I used an example that obfuscated the question that I >>>>wanted to ask. >>>> >>>>I really wanted to know how to have extra arguments in >>>> >>>> >>>functions that >>> >>> >>>>would allow, per the example code, for something like a >>>> >>>> >>>counter to be >>> >>> >>>>incremented. Thomas's suggestion of using mapply >>>> >>>> >>>(reproduced below with >>> >>> >>>>corrections) is probably closest. >>>> >>>> >>>It is probably worth pointing out here that the R >>>documentation does not >>>specify the order in which lapply() does the computation. >>> >>>If you could work out how to increment a counter (and you could, with >>>sufficient effort), it would not necessarily work, because the 'i'th >>>evaluation would not necessarily be of the 'i'th element. >>> >>>[lapply() does in fact start at the beginning, go on until it >>>gets to the >>>end, and then stop, but this isn't documented. Suppose R became >>>multithreaded, for example....] >>> >>> >>The corollary, it seems to me, is that sometimes it's better to leave the >>good old for loop alone. It's not always profitable to turn for loops into >>some *apply construct. The trick is learning to know when to do it and when >>not to. >> >> > >Can someone share some of this tricks with me? Up to now I have always >done things with for loop. Just recently I started to pay attention to >*apply* constructs and I already wanted to start implementing them >instead of good old for, but then a stroke of lightning came from this >thread. Based on words from Thomas, lapply should not be used for tasks >where order is critical. Did I get this clear enough. Additionally, I >have read notes (I lost link, but was posted on R-help, I think) from >Thomas on R and he mentioned that it is commonly assumed that *apply* (I >do not remember which one of *apply*) is faster than loop, but that this >is not true. Any additional pointers to literature? > > >