I think a more common idiom for the simpler case would be to use indexing vals <- c(0.1, 0.15, 0.2) mult <- vals[ASBclass] (However, some people are on the move to enforce mult <- vals[as.numeric(ASBclass)] because people who confuse factors and character variables get even more confused about factor indexing being different from character indexing.) For the more complex cases, I think Chuck's split/unsplit principle is the ticket. For one thing, you avoid silliness like (X >= 0) * sqrt(X) + (X < 0) * -sqrt(-X) coming out with warnings from calculating the non-selected alternative. -pd> On 16 Sep 2015, at 07:56 , Anthoni, Peter (IMK) <peter.anthoni at kit.edu> wrote: > > Hi, > > I guess this might work too and might be quite speedy: > > ASBclass = factor(c(1,2,2,3,2,1)) > Flow = c(1,1,1,1,1,1) > > mult = ((ASBclass==1) * 0.1 + (ASBclass==2) * 0.15 + (ASBclass==3) * 0.2) > deviation = mult * Flow > > or with the more complex arithmetic: > > deviation = ((ASBclass==1) * (Flow*2) + (ASBclass==2) * (Flow+3) + (ASBclass==3) * sqrt(Flow)) > > cheers > Peter > > > >> On 16 Sep 2015, at 04:20, Charles C. Berry <ccberry at ucsd.edu> wrote: >> >> On Tue, 15 Sep 2015, Bert Gunter wrote: >> >>> Thanks to both Davids. >>> >>> I realize that these things are often a matter of aesthetics -- and >>> hence have little rational justification -- but I agree with The Other >>> David: eval(parse) seems to me to violate R's soul( it makes R a macro >>> language instead of a functional one). >>> >>> However, mapply(... switch) effectively loops through the frame row by >>> row. Aesthetically, I like it; but it seems inefficient. If there are >>> e.g. 1e6 rows in say 10 categories, I think Jeff's approach should do >>> much better. I'll try to generate some actual data to see unless >>> someone else beats me to it. >> >> Use mapply like this on large problems: >> >> unsplit( >> mapply( >> function(x,z) eval( x, list( y=z )), >> expression( A=y*2, B=y+3, C=sqrt(y) ), >> split( dat$Flow, dat$ASB ), >> SIMPLIFY=FALSE), >> dat$ASB) >> >> Chuck >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Yes! Chuck's use of mapply is exactly the split/combine strategy I was looking for. In retrospect, exactly how one should think about it. Many thanks to all for a constructive discussion . -- Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Wed, Sep 16, 2015 at 12:40 AM, peter dalgaard <pdalgd at gmail.com> wrote:> I think a more common idiom for the simpler case would be to use indexing > > vals <- c(0.1, 0.15, 0.2) > mult <- vals[ASBclass] > > (However, some people are on the move to enforce > > mult <- vals[as.numeric(ASBclass)] > > because people who confuse factors and character variables get even more confused about factor indexing being different from character indexing.) > > For the more complex cases, I think Chuck's split/unsplit principle is the ticket. For one thing, you avoid silliness like (X >= 0) * sqrt(X) + (X < 0) * -sqrt(-X) coming out with warnings from calculating the non-selected alternative. > > -pd > >> On 16 Sep 2015, at 07:56 , Anthoni, Peter (IMK) <peter.anthoni at kit.edu> wrote: >> >> Hi, >> >> I guess this might work too and might be quite speedy: >> >> ASBclass = factor(c(1,2,2,3,2,1)) >> Flow = c(1,1,1,1,1,1) >> >> mult = ((ASBclass==1) * 0.1 + (ASBclass==2) * 0.15 + (ASBclass==3) * 0.2) >> deviation = mult * Flow >> >> or with the more complex arithmetic: >> >> deviation = ((ASBclass==1) * (Flow*2) + (ASBclass==2) * (Flow+3) + (ASBclass==3) * sqrt(Flow)) >> >> cheers >> Peter >> >> >> >>> On 16 Sep 2015, at 04:20, Charles C. Berry <ccberry at ucsd.edu> wrote: >>> >>> On Tue, 15 Sep 2015, Bert Gunter wrote: >>> >>>> Thanks to both Davids. >>>> >>>> I realize that these things are often a matter of aesthetics -- and >>>> hence have little rational justification -- but I agree with The Other >>>> David: eval(parse) seems to me to violate R's soul( it makes R a macro >>>> language instead of a functional one). >>>> >>>> However, mapply(... switch) effectively loops through the frame row by >>>> row. Aesthetically, I like it; but it seems inefficient. If there are >>>> e.g. 1e6 rows in say 10 categories, I think Jeff's approach should do >>>> much better. I'll try to generate some actual data to see unless >>>> someone else beats me to it. >>> >>> Use mapply like this on large problems: >>> >>> unsplit( >>> mapply( >>> function(x,z) eval( x, list( y=z )), >>> expression( A=y*2, B=y+3, C=sqrt(y) ), >>> split( dat$Flow, dat$ASB ), >>> SIMPLIFY=FALSE), >>> dat$ASB) >>> >>> Chuck >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On 09/16/2015 04:41 PM, Bert Gunter wrote:> Yes! Chuck's use of mapply is exactly the split/combine strategy I was > looking for. In retrospect, exactly how one should think about it. > Many thanks to all for a constructive discussion . > > -- Bert > > > Bert Gunter > >>>> >>>> Use mapply like this on large problems: >>>> >>>> unsplit( >>>> mapply( >>>> function(x,z) eval( x, list( y=z )), >>>> expression( A=y*2, B=y+3, C=sqrt(y) ), >>>> split( dat$Flow, dat$ASB ), >>>> SIMPLIFY=FALSE), >>>> dat$ASB) >>>> >>>> Chuck >>>>Is there any reason not to use data.table for this purpose, especially if efficiency is of concern? --- # load data.table and microbenchmark library(data.table) library(microbenchmark) # # prepare data DF <- data.frame( ASB = rep_len(factor(LETTERS[1:3]), 3e5), Flow = rnorm(3e5)^2) DT <- as.data.table(DF) DT[, ASB := as.character(ASB)] # # define functions # # Chuck's version fnSplit <- function(dat) { unsplit( mapply( function(x,z) eval( x, list( y=z )), expression( A=y*2, B=y+3, C=sqrt(y) ), split( dat$Flow, dat$ASB ), SIMPLIFY=FALSE), dat$ASB) } # # data.table-way (IMHO, much easier to read) fnDataTable <- function(dat) { dat[, result : if (.BY == "A") { 2 * Flow } else if (.BY == "B") { 3 + Flow } else if (.BY == "C") { sqrt(Flow) }, by = ASB] } # # benchmark # microbenchmark(fnSplit(DF), fnDataTable(DT)) identical(fnSplit(DF), fnDataTable(DT)[, result]) --- Actually, in Chuck's version the unsplit() part is slow. If the order is not of concern (e.g., DF is reordered before calling fnSplit), fnSplit is comparable to the DT-version. Denes