Liaw, Andy
2006-Jan-25 01:01 UTC
[R] lazy evaluation (was RE: Number of replications of a term)
From: Thomas Lumley> > On Wed, 25 Jan 2006, Ray Brownrigg wrote: > > > There's an even faster one, which nobody seems to have > mentioned yet: > > > > rep(l <- rle(ids)$lengths, l) > > I considered this but it wasn't clear to me from the initial > post that > each ID occupied a contiguous section of the vector. > > Also, lazy evaluation makes code like this > rep(l <- rle(ids)$lengths, l) > a bit worrying. It relies on rep() using the first argument > before it uses > the second one. In this case, clearly, it works, but it is > not a style I > would encourage and it's easy to construct functions where it fails.Indeed. Here's a trivial example: 2: package BRmisc in options("defaultPackages") was not found> f <- function(x, y) {+ print(y) + x + y + }> f(a <- 3, a)Error in print(y) : object "a" not found Without the print(), the function would work just fine. Andy> -thomas > > > > > Timing on my 2.8GHz NetBSD system shows: > > > >> length(ids) > > [1] 45150 > >> # Gabor: > >> system.time(for (i in 1:100) ave(as.numeric(factor(ids)), > ids, FUN > > length)) > > [1] 3.45 0.06 3.54 0.00 0.00 > >> # Barry (and others I think): > >> system.time(for (i in 1:100) table(ids)[ids]) > > [1] 2.13 0.05 2.20 0.00 0.00 > >> Me: > >> system.time(for (i in 1:100) rep(l <- rle(ids)$lengths, l)) > > [1] 1.60 0.00 1.62 0.00 0.00 > > > > Of course the difference between 21 milliseconds and 16 > milliseconds is > > not great, unless you are doing this a lot. > > > > Ray Brownrigg > > > >> From: Gabor Grothendieck <ggrothendieck at gmail.com> > >> > >> Nice. I timed it and its much faster than mine too. > >> > >> On 1/24/06, Barry Rowlingson <B.Rowlingson at lancaster.ac.uk> wrote: > >>> Laetitia Marisa wrote: > >>>> Hello, > >>>> > >>>> Is there a simple and fast function that returns a > vector of the number > >>>> of replications for each object of a vector ? > >>>> For example : > >>>> I have a vector of IDs : > >>>> ids <- c( "ID1", "ID2", "ID2", "ID3", "ID3","ID3", "ID5") > >>>> > >>>> I want the function returns the following vector where > each term is the > >>>> number of replicates for the given id : > >>>> c( 1, 2, 2, 3,3,3,1 ) > >>> > >>> One-liner: > >>> > >>> > table(ids)[ids] > >>> ids > >>> ID1 ID2 ID2 ID3 ID3 ID3 ID5 > >>> 1 2 2 3 3 3 1 > >>> > >>> 'table(ids)' computes the counts, then the subscripting > [ids] looks it > >>> all up. > >>> > >>> Now try it on your 40,000-long vector! > >>> > >>> Barry > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > > > Thomas Lumley Assoc. Professor, Biostatistics > tlumley at u.washington.edu University of Washington, Seattle > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >