thr3ads.net - R help - [R] ddply from plyr package

If this information is useful, please help other people find it:
Share via:

AdamMarczak

2011-Aug-24 16:25 UTC

[R] ddply from plyr package - any alternatives?

Hello everyone,
I was asked to repost this again, sorry for any inconvenience.

I'm looking replacement for ddply function from plyr package. 
Function allows to apply function by category stored in any column/columns.

Regular loops or lapplys slow down greatly because my unique combination
count exceeds 9000. Is there any available solution which allow me to apply
function by category? 

currently my code looks like snippet below 

ddply(myData, c("country_name", "product_name"), myFunction)

Please note that I'm looking for decently performing resolution. 

Thanks in advance! 

With regards, 
Adam.

--
View this message in context:
http://r.789695.n4.nabble.com/ddply-from-plyr-package-any-alternatives-tp3765936p3765936.html
Sent from the R help mailing list archive at Nabble.com.

Tal Galili

2011-Aug-25 07:09 UTC

head link

[R] ddply from plyr package - any alternatives?

Hi Adam,
I don't think there is a faster alternative to plyr, without doing it in
nested for loops, with a lot of book-keeping of variables  (but if someone
here were to correct me, I'd be happy to know).

Two things to consider:
1) See if you can optimizing your function.  (there is a lot of material on
R code optimization online)
2) plyr has a parallel processing backend.
Here is a post I wrote about how to use it for windows users (as myself) :
http://www.r-statistics.com/2010/09/using-the-plyr-1-2-package-parallel-processing-backend-with-windows/

Good luck,
Tal



----------------Contact
Details:-------------------------------------------------------
Contact me: Tal.Galili@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
----------------------------------------------------------------------------------------------




On Wed, Aug 24, 2011 at 7:25 PM, AdamMarczak <adam.marczak@gmail.com>
wrote:
> Hello everyone,
> I was asked to repost this again, sorry for any inconvenience.
>
> I'm looking replacement for ddply function from plyr package.
> Function allows to apply function by category stored in any column/columns.
>
> Regular loops or lapplys slow down greatly because my unique combination
> count exceeds 9000. Is there any available solution which allow me to apply
> function by category?
>
> currently my code looks like snippet below
>
> ddply(myData, c("country_name", "product_name"),
myFunction)
>
> Please note that I'm looking for decently performing resolution.
>
> Thanks in advance!
>
> With regards,
> Adam.
>
> --
> View this message in context:
>
http://r.789695.n4.nabble.com/ddply-from-plyr-package-any-alternatives-tp3765936p3765936.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Paul Hiemstra

2011-Aug-25 07:27 UTC

head link

[R] ddply from plyr package - any alternatives?

Hi Adam,

A recent thread on R-help deals exactly with your problem. In one of the
responses I compare ddply to a number of alternative solutions (using
ave and data.table) [1]. The test in the e-mail shows that for large
amounts of unique categories, ddply is quite slow. Hadley (Wickham,
author of ddply) remarked in reply to a question on the plyr mailing
list that this was due to how ddply was setup [2]. So in your case I
would definitely take a look at data.table, which is probably much
faster. If that does not work, take a look at ave which is also quite a
bit faster for your problem.

cheers,
Paul

[1] http://www.mail-archive.com/r-help at r-project.org/msg142797.html
[2]
http://groups.google.com/group/manipulatr/browse_thread/thread/5e8dfed85048df99

On 08/24/2011 04:25 PM, AdamMarczak wrote:> Hello everyone,
> I was asked to repost this again, sorry for any inconvenience.
>
> I'm looking replacement for ddply function from plyr package. 
> Function allows to apply function by category stored in any column/columns.
>
> Regular loops or lapplys slow down greatly because my unique combination
> count exceeds 9000. Is there any available solution which allow me to apply
> function by category? 
>
> currently my code looks like snippet below 
>
> ddply(myData, c("country_name", "product_name"),
myFunction)
>
> Please note that I'm looking for decently performing resolution. 
>
> Thanks in advance! 
>
> With regards, 
> Adam.
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/ddply-from-plyr-package-any-alternatives-tp3765936p3765936.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Paul Hiemstra, Ph.D.
Global Climate Division
Royal Netherlands Meteorological Institute (KNMI)
Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
P.O. Box 201 | 3730 AE | De Bilt
tel: +31 30 2206 494

http://intamap.geo.uu.nl/~paul
http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770

Paul Hiemstra

2011-Aug-26 11:19 UTC

head link

[R] ddply from plyr package - any alternatives?

On 08/26/2011 09:14 AM, AdamMarczak wrote:> Thank you all for suggestions, they were great and informative. 
> I will surely use data.tables in future when our server will be upgraded
for
> now this is solution that I used. This solution performs exactly same task
> and produces exact same results at ddply.
>
>
>   s <- split(past,
paste(past$"CNTRY_NAME",past$"SEG_NAME"))
>   R2 <- lapply(s, function(x) return(list(x$"CNTRY_NAME"[1],
x$"SEG_NAME[1],
> summary(lm(VAL~fy,x))$r.squared)));
>   R2 <- data.frame(do.call(rbind, R2))
>   R2[,1] <- unlist(R2[,1]); R2[,2] <- unlist(R2[,2]); R2[,3] <-
> unlist(R2[,3]);
>   colnames(R2)[1:3] <-
c("CNTRY_NAME","SEG_NAME","V1")
>   R2<-R2[order(R2$CNTRY_NAME,R2$SEG_NAME),]
Is it much faster than ddply? And why not use data.table? You do not
need a new server to benefit from the speed gain.

Paul>
> Above lines produce exactly same result as ddply in the exactly same
fashion
> allow quick replacement of ddply without any further rebuild of the code
> (sorting is just precaution).
>
> Best regards,
> Adam.
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/ddply-from-plyr-package-any-alternatives-tp3765936p3770352.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Paul Hiemstra, Ph.D.
Global Climate Division
Royal Netherlands Meteorological Institute (KNMI)
Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
P.O. Box 201 | 3730 AE | De Bilt
tel: +31 30 2206 494

http://intamap.geo.uu.nl/~paul
http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770

Maybe Matching Threads

Search for more maybe matching threads

R help - Aug 2011 - ddply from plyr package - any alternatives?

[R] ddply from plyr package - any alternatives?

[R] ddply from plyr package - any alternatives?

[R] ddply from plyr package - any alternatives?

[R] ddply from plyr package - any alternatives?

Maybe Matching Threads