Dear list, In a little numbers game, I've hit a performance snag and I'm not sure how to code this in C. The game is the following: how many 8-digit numbers have the sum of their digits equal to 17? The brute-force answer could be: maxi <- 9 # digits from 0 to 9 N <- 5 # 8 is too large test <- 17 # for example's sake sum(rowSums(do.call(expand.grid, c(list(1:maxi), rep(list(0:maxi), N-1)))) == test) ## 3675 Now, if I make N = 8, R stalls for some time and finally gives up with: Error: cannot allocate vector of size 343.3 Mb I thought I could get around this using Reduce() to recursively apply rowSum to intermediate results, but it doesn't seem to help, a=list(1:maxi) b=rep(list(0:maxi), N-1) foo <- function(a, b, fun="sum", ...){ switch(fun, 'sum' = rowSums(do.call(expand.grid, c(a, b))), 'mean' = rowMeans(do.call(expand.grid, c(a, b))), apply(do.call(expand.grid, c(a, b)), 1, fun, ...)) # generic case } sum(Reduce(foo, list(b), init=a) == test) ## 3675 # OK Same problem with N=8. Now, if N was fixed I could write a little C code to do this calculation term-by-term, something along those lines, test = 0; for (i1=1, i1=9, i1++) { for (i2=0, i2=9, i2++) { [... other nested loops ] test = test + (i1 + i2 + [...] == 17); } [...] } but here the number of for loops, N, should remain a variable. In despair I coded this in R as a wicked eval(parse()) construct, and it does produce the expected result after an awfully long time. makeNestedLoops <- function(N=3){ startLoop <- function(ii, start=1, end=9){ paste("for (i", ii, " in seq(",start,", ",end,")) {\n", sep="") } first <- startLoop(1) inner.start <- lapply(seq(2, N), startLoop, start=0) calculation <- paste("test <- test + (", paste("i", seq(1, N), sep="", collapse="+"), "==17 )\n") end <- replicate(N, "}\n") code.to.run <- do.call(paste, c(list(first), inner.start, calculation, end)) cat(code.to.run) invisible(code.to.run) } test <- 0 eval(parse(text = makeNestedLoops(8)) ) ## 229713 I hope I have missed a better way to do this in R. Otherwise, I believe what I'm after is some kind of C or C++ macro expansion, because the number of loops should not be hard coded. Thanks for any tips you may have! Best regards, baptiste
On Dec 19, 2009, at 9:06 AM, baptiste auguie wrote:> Dear list, > > In a little numbers game, I've hit a performance snag and I'm not sure > how to code this in C. > > The game is the following: how many 8-digit numbers have the sum of > their digits equal to 17? > The brute-force answer could be: > > maxi <- 9 # digits from 0 to 9 > N <- 5 # 8 is too large > test <- 17 # for example's sake > > sum(rowSums(do.call(expand.grid, c(list(1:maxi), rep(list(0:maxi), > N-1)))) == test) > ## 3675 > > Now, if I make N = 8, R stalls for some time and finally gives up > with: > Error: cannot allocate vector of size 343.3 Mb > > I thought I could get around this using Reduce() to recursively apply > rowSum to intermediate results, but it doesn't seem to help, > > > a=list(1:maxi) > b=rep(list(0:maxi), N-1) > > foo <- function(a, b, fun="sum", ...){ > switch(fun, > 'sum' = rowSums(do.call(expand.grid, c(a, b))), > 'mean' = rowMeans(do.call(expand.grid, c(a, b))), > apply(do.call(expand.grid, c(a, b)), 1, fun, ...)) # generic > case > } > > sum(Reduce(foo, list(b), init=a) == test) > ## 3675 # OKAnd are you considering the number "00000089" to be in the acceptable set? Or is the range of possible numbers in 10000079:98000000 ? -- David> > Same problem with N=8. > > Now, if N was fixed I could write a little C code to do this > calculation term-by-term, something along those lines, > > test = 0; > > for (i1=1, i1=9, i1++) { > for (i2=0, i2=9, i2++) { > > [... other nested loops ] > > test = test + (i1 + i2 + [...] == 17); > > } [...] > } > > but here the number of for loops, N, should remain a variable. > > In despair I coded this in R as a wicked eval(parse()) construct, and > it does produce the expected result after an awfully long time. > > makeNestedLoops <- function(N=3){ > > startLoop <- function(ii, start=1, end=9){ > paste("for (i", ii, " in seq(",start,", ",end,")) {\n", sep="") > } > > first <- startLoop(1) > inner.start <- lapply(seq(2, N), startLoop, start=0) > calculation <- paste("test <- test + (", paste("i", seq(1, N), > sep="", collapse="+"), "==17 )\n") > end <- replicate(N, "}\n") > code.to.run <- do.call(paste, c(list(first), inner.start, > calculation, end)) > cat(code.to.run) > invisible(code.to.run) > } > > test <- 0 > eval(parse(text = makeNestedLoops(8)) ) > ## 229713 > > I hope I have missed a better way to do this in R. Otherwise, I > believe what I'm after is some kind of C or C++ macro expansion, > because the number of loops should not be hard coded. > > Thanks for any tips you may have! > > Best regards, > > baptiste > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT
> I hope I have missed a better way to do this in R. Otherwise, I > believe what I'm after is some kind of C or C++ macro expansion, > because the number of loops should not be hard coded.Why not generate the list of integers that sum to 17, and then mix with 0s as appropriate? Hadley -- http://had.co.nz/
This problem does yield some interesting and unexpected distributions. Here is another, the number of positive cases as a function of number of digits (8 in the original question) and of test value (17). maxi <- 9 N <- 5 test <- 17 foo <- function(N=2, test=1){ sum(rowSums(do.call(expand.grid, c(list(1:maxi), rep(list(0:maxi), N-1)))) == test) } library(plyr) N <- seq(1, 6) maxi <- 9*N params <- data.frame(from=N, to=maxi+1) params2 <- mlply(params, seq) NN <- lapply(seq_along(params2), function(x) rep(x, length(params2[[x]]))) params3 <- data.frame(N=do.call(c, NN), test=do.call(c, params2)) results <- mdply(params3, foo) library(ggplot2) p <- qplot(test, V1, data=results, geom="path")+ facet_wrap(~N, scales="free") + scale_x_continuous(expand=c(0, 0))+ xlab("test") + ylab("number of match") p Best, baptiste [David: sorry for the duplicate, i initially sent an attachment that was too large for the list]> 2009/12/19 David Winsemius <dwinsemius at comcast.net>: >> >> On Dec 19, 2009, at 2:10 PM, David Winsemius wrote: >> >>> >>> On Dec 19, 2009, at 1:36 PM, baptiste auguie wrote: >>> >>>> 2009/12/19 David Winsemius <dwinsemius at comcast.net>: >>>>> >>>>> On Dec 19, 2009, at 9:06 AM, baptiste auguie wrote: >>>>> >>>>>> Dear list, >>>>>> >>>>>> In a little numbers game, I've hit a performance snag and I'm not sure >>>>>> how to code this in C. >>>>>> >>>>>> The game is the following: how many 8-digit numbers have the sum of >>>>>> their digits equal to 17? >>>>> >>>>> And are you considering the number "00000089" to be in the acceptable >>>>> set? >>>>> Or is the range of possible numbers in 10000079:98000000 ? >>>>> >>>> >>>> The latter, the first digit should not be 0. But if you have an >>>> interesting solution for the other case, let me know anyway. >>>> >>>> I should also stress that this is only for entertainment and curiosity's >>>> sake. >>> >>> The sequence of numbers whose digit sum is 13 is one of the ATT integer >>> sequences: >>> >>> http://www.research.att.com/~njas/sequences/A143164 >>> >>> No R implementations there, only Maple and Mathematica. Rather than doing >>> strsplit()'s, I thought that a mathematical approach would be faster for a >>> sumdigits function: >>> >>> sumdigits <- function(x) { s=0; for (i in 1:(1+floor(log(x, base=10))) ){ >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?s<-s+x%%10; x<-x%/%10} >>> ? ? ? ? ? ? ? ? ? ? ? ? ? return(s) } >>> >>> # what follows would be expected to take roughly 1/100th ?and 1/50th of >>> the time of a complete enumeration but is useful for estimating the size of >>> the result and the time of an exhaustive search: >>> >>> > system.time( for (i in 10000079:11000079) if (sumdigits(i)==17) >>> > {idx<-c(idx,i)}) >>> ?user ?system elapsed >>> 30.997 ? 3.516 ?34.403 >>> >>> > system.time( for (i in 10000079:12000079) if (sumdigits(i)==17) >>> > {idx<-c(idx,i)}) >>> ?user ?system elapsed >>> 55.975 ? 2.433 ?58.218 >>> >>> > head(idx) >>> [1] 10000079 10000088 10000097 10000169 10000178 10000187 >>> >>> > length(idx) >>> [1] 31572 >>> >>> So it looks as though an exhaustive strategy would take a bit under an >>> hour on my machine (a 2 year-old device) and be a vector around 1578600 >>> elements in length. (Takes very little memory, and would undoubtedly be >>> faster if I could use more than one core.) >> >> I was assuming that it would be a relatively constant density of numbers in >> that range which can be seen to be false by examining a density plot of >> roughly the first 5% of that range. >> >>> system.time( ?for (i in 10000079:14000079) if (sumdigits(i)==17) >>> {idx<-c(idx,i)}) >> ? user ?system elapsed >> 113.074 ? 5.764 118.391 >>> plot(density(idx)) >> >> Plot attached: >> >> >>> >>>> >>>> baptiste >> >> David Winsemius, MD >> Heritage Laboratories >> West Hartford, CT >> >> >> >
Hi library(partitions) jj <- blockparts(rep(9,8),17) dim(jj) gives 318648 HTH rksh baptiste auguie wrote:> Dear list, > > In a little numbers game, I've hit a performance snag and I'm not sure > how to code this in C. > > The game is the following: how many 8-digit numbers have the sum of > their digits equal to 17? > The brute-force answer could be: > > maxi <- 9 # digits from 0 to 9 > N <- 5 # 8 is too large > test <- 17 # for example's sake > > sum(rowSums(do.call(expand.grid, c(list(1:maxi), rep(list(0:maxi), > N-1)))) == test) > ## 3675 > > Now, if I make N = 8, R stalls for some time and finally gives up with: > Error: cannot allocate vector of size 343.3 Mb > > I thought I could get around this using Reduce() to recursively apply > rowSum to intermediate results, but it doesn't seem to help, > > > a=list(1:maxi) > b=rep(list(0:maxi), N-1) > > foo <- function(a, b, fun="sum", ...){ > switch(fun, > 'sum' = rowSums(do.call(expand.grid, c(a, b))), > 'mean' = rowMeans(do.call(expand.grid, c(a, b))), > apply(do.call(expand.grid, c(a, b)), 1, fun, ...)) # generic case > } > > sum(Reduce(foo, list(b), init=a) == test) > ## 3675 # OK > > Same problem with N=8. > > Now, if N was fixed I could write a little C code to do this > calculation term-by-term, something along those lines, > > test = 0; > > for (i1=1, i1=9, i1++) { > for (i2=0, i2=9, i2++) { > > [... other nested loops ] > > test = test + (i1 + i2 + [...] == 17); > > } [...] > } > > but here the number of for loops, N, should remain a variable. > > In despair I coded this in R as a wicked eval(parse()) construct, and > it does produce the expected result after an awfully long time. > > makeNestedLoops <- function(N=3){ > > startLoop <- function(ii, start=1, end=9){ > paste("for (i", ii, " in seq(",start,", ",end,")) {\n", sep="") > } > > first <- startLoop(1) > inner.start <- lapply(seq(2, N), startLoop, start=0) > calculation <- paste("test <- test + (", paste("i", seq(1, N), > sep="", collapse="+"), "==17 )\n") > end <- replicate(N, "}\n") > code.to.run <- do.call(paste, c(list(first), inner.start, calculation, end)) > cat(code.to.run) > invisible(code.to.run) > } > > test <- 0 > eval(parse(text = makeNestedLoops(8)) ) > ## 229713 > > I hope I have missed a better way to do this in R. Otherwise, I > believe what I'm after is some kind of C or C++ macro expansion, > because the number of loops should not be hard coded. > > Thanks for any tips you may have! > > Best regards, > > baptiste > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Robin K. S. Hankin Uncertainty Analyst University of Cambridge 19 Silver Street Cambridge CB3 9EP 01223-764877