Hello everyone, I was asked to repost this again, sorry for any inconvenience. I'm looking replacement for ddply function from plyr package. Function allows to apply function by category stored in any column/columns. Regular loops or lapplys slow down greatly because my unique combination count exceeds 9000. Is there any available solution which allow me to apply function by category? currently my code looks like snippet below ddply(myData, c("country_name", "product_name"), myFunction) Please note that I'm looking for decently performing resolution. Thanks in advance! With regards, Adam. -- View this message in context: http://r.789695.n4.nabble.com/ddply-from-plyr-package-any-alternatives-tp3765936p3765936.html Sent from the R help mailing list archive at Nabble.com.
Hi Adam, I don't think there is a faster alternative to plyr, without doing it in nested for loops, with a lot of book-keeping of variables (but if someone here were to correct me, I'd be happy to know). Two things to consider: 1) See if you can optimizing your function. (there is a lot of material on R code optimization online) 2) plyr has a parallel processing backend. Here is a post I wrote about how to use it for windows users (as myself) : http://www.r-statistics.com/2010/09/using-the-plyr-1-2-package-parallel-processing-backend-with-windows/ Good luck, Tal ----------------Contact Details:------------------------------------------------------- Contact me: Tal.Galili@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- On Wed, Aug 24, 2011 at 7:25 PM, AdamMarczak <adam.marczak@gmail.com> wrote:> Hello everyone, > I was asked to repost this again, sorry for any inconvenience. > > I'm looking replacement for ddply function from plyr package. > Function allows to apply function by category stored in any column/columns. > > Regular loops or lapplys slow down greatly because my unique combination > count exceeds 9000. Is there any available solution which allow me to apply > function by category? > > currently my code looks like snippet below > > ddply(myData, c("country_name", "product_name"), myFunction) > > Please note that I'm looking for decently performing resolution. > > Thanks in advance! > > With regards, > Adam. > > -- > View this message in context: > http://r.789695.n4.nabble.com/ddply-from-plyr-package-any-alternatives-tp3765936p3765936.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hi Adam, A recent thread on R-help deals exactly with your problem. In one of the responses I compare ddply to a number of alternative solutions (using ave and data.table) [1]. The test in the e-mail shows that for large amounts of unique categories, ddply is quite slow. Hadley (Wickham, author of ddply) remarked in reply to a question on the plyr mailing list that this was due to how ddply was setup [2]. So in your case I would definitely take a look at data.table, which is probably much faster. If that does not work, take a look at ave which is also quite a bit faster for your problem. cheers, Paul [1] http://www.mail-archive.com/r-help at r-project.org/msg142797.html [2] http://groups.google.com/group/manipulatr/browse_thread/thread/5e8dfed85048df99 On 08/24/2011 04:25 PM, AdamMarczak wrote:> Hello everyone, > I was asked to repost this again, sorry for any inconvenience. > > I'm looking replacement for ddply function from plyr package. > Function allows to apply function by category stored in any column/columns. > > Regular loops or lapplys slow down greatly because my unique combination > count exceeds 9000. Is there any available solution which allow me to apply > function by category? > > currently my code looks like snippet below > > ddply(myData, c("country_name", "product_name"), myFunction) > > Please note that I'm looking for decently performing resolution. > > Thanks in advance! > > With regards, > Adam. > > -- > View this message in context: http://r.789695.n4.nabble.com/ddply-from-plyr-package-any-alternatives-tp3765936p3765936.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770
On 08/26/2011 09:14 AM, AdamMarczak wrote:> Thank you all for suggestions, they were great and informative. > I will surely use data.tables in future when our server will be upgraded for > now this is solution that I used. This solution performs exactly same task > and produces exact same results at ddply. > > > s <- split(past, paste(past$"CNTRY_NAME",past$"SEG_NAME")) > R2 <- lapply(s, function(x) return(list(x$"CNTRY_NAME"[1], x$"SEG_NAME[1], > summary(lm(VAL~fy,x))$r.squared))); > R2 <- data.frame(do.call(rbind, R2)) > R2[,1] <- unlist(R2[,1]); R2[,2] <- unlist(R2[,2]); R2[,3] <- > unlist(R2[,3]); > colnames(R2)[1:3] <- c("CNTRY_NAME","SEG_NAME","V1") > R2<-R2[order(R2$CNTRY_NAME,R2$SEG_NAME),]Is it much faster than ddply? And why not use data.table? You do not need a new server to benefit from the speed gain. Paul> > Above lines produce exactly same result as ddply in the exactly same fashion > allow quick replacement of ddply without any further rebuild of the code > (sorting is just precaution). > > Best regards, > Adam. > > -- > View this message in context: http://r.789695.n4.nabble.com/ddply-from-plyr-package-any-alternatives-tp3765936p3770352.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770
Reasonably Related Threads
- [plyr] Question regarding ddply: use of .(as.name(varname)) and varname in ddply function
- Apply functions along "layers" of a data matrix
- apply and functions with many arguments
- Problem with ddply in the plyr-package: surprising output of a date-column
- ggplot2 - extracting values of smooth