Dear list,
In a little numbers game, I've hit a performance snag and I'm not sure
how to code this in C.
The game is the following: how many 8-digit numbers have the sum of
their digits equal to 17?
The brute-force answer could be:
maxi <- 9 # digits from 0 to 9
N <- 5 # 8 is too large
test <- 17 # for example's sake
sum(rowSums(do.call(expand.grid, c(list(1:maxi), rep(list(0:maxi),
N-1)))) == test)
## 3675
Now, if I make N = 8, R stalls for some time and finally gives up with:
Error: cannot allocate vector of size 343.3 Mb
I thought I could get around this using Reduce() to recursively apply
rowSum to intermediate results, but it doesn't seem to help,
a=list(1:maxi)
b=rep(list(0:maxi), N-1)
foo <- function(a, b, fun="sum", ...){
switch(fun,
'sum' = rowSums(do.call(expand.grid, c(a, b))),
'mean' = rowMeans(do.call(expand.grid, c(a, b))),
apply(do.call(expand.grid, c(a, b)), 1, fun, ...)) # generic case
}
sum(Reduce(foo, list(b), init=a) == test)
## 3675 # OK
Same problem with N=8.
Now, if N was fixed I could write a little C code to do this
calculation term-by-term, something along those lines,
test = 0;
for (i1=1, i1=9, i1++) {
for (i2=0, i2=9, i2++) {
[... other nested loops ]
test = test + (i1 + i2 + [...] == 17);
} [...]
}
but here the number of for loops, N, should remain a variable.
In despair I coded this in R as a wicked eval(parse()) construct, and
it does produce the expected result after an awfully long time.
makeNestedLoops <- function(N=3){
startLoop <- function(ii, start=1, end=9){
paste("for (i", ii, " in seq(",start,",
",end,")) {\n", sep="")
}
first <- startLoop(1)
inner.start <- lapply(seq(2, N), startLoop, start=0)
calculation <- paste("test <- test + (", paste("i",
seq(1, N),
sep="", collapse="+"), "==17 )\n")
end <- replicate(N, "}\n")
code.to.run <- do.call(paste, c(list(first), inner.start, calculation,
end))
cat(code.to.run)
invisible(code.to.run)
}
test <- 0
eval(parse(text = makeNestedLoops(8)) )
## 229713
I hope I have missed a better way to do this in R. Otherwise, I
believe what I'm after is some kind of C or C++ macro expansion,
because the number of loops should not be hard coded.
Thanks for any tips you may have!
Best regards,
baptiste
On Dec 19, 2009, at 9:06 AM, baptiste auguie wrote:> Dear list, > > In a little numbers game, I've hit a performance snag and I'm not sure > how to code this in C. > > The game is the following: how many 8-digit numbers have the sum of > their digits equal to 17? > The brute-force answer could be: > > maxi <- 9 # digits from 0 to 9 > N <- 5 # 8 is too large > test <- 17 # for example's sake > > sum(rowSums(do.call(expand.grid, c(list(1:maxi), rep(list(0:maxi), > N-1)))) == test) > ## 3675 > > Now, if I make N = 8, R stalls for some time and finally gives up > with: > Error: cannot allocate vector of size 343.3 Mb > > I thought I could get around this using Reduce() to recursively apply > rowSum to intermediate results, but it doesn't seem to help, > > > a=list(1:maxi) > b=rep(list(0:maxi), N-1) > > foo <- function(a, b, fun="sum", ...){ > switch(fun, > 'sum' = rowSums(do.call(expand.grid, c(a, b))), > 'mean' = rowMeans(do.call(expand.grid, c(a, b))), > apply(do.call(expand.grid, c(a, b)), 1, fun, ...)) # generic > case > } > > sum(Reduce(foo, list(b), init=a) == test) > ## 3675 # OKAnd are you considering the number "00000089" to be in the acceptable set? Or is the range of possible numbers in 10000079:98000000 ? -- David> > Same problem with N=8. > > Now, if N was fixed I could write a little C code to do this > calculation term-by-term, something along those lines, > > test = 0; > > for (i1=1, i1=9, i1++) { > for (i2=0, i2=9, i2++) { > > [... other nested loops ] > > test = test + (i1 + i2 + [...] == 17); > > } [...] > } > > but here the number of for loops, N, should remain a variable. > > In despair I coded this in R as a wicked eval(parse()) construct, and > it does produce the expected result after an awfully long time. > > makeNestedLoops <- function(N=3){ > > startLoop <- function(ii, start=1, end=9){ > paste("for (i", ii, " in seq(",start,", ",end,")) {\n", sep="") > } > > first <- startLoop(1) > inner.start <- lapply(seq(2, N), startLoop, start=0) > calculation <- paste("test <- test + (", paste("i", seq(1, N), > sep="", collapse="+"), "==17 )\n") > end <- replicate(N, "}\n") > code.to.run <- do.call(paste, c(list(first), inner.start, > calculation, end)) > cat(code.to.run) > invisible(code.to.run) > } > > test <- 0 > eval(parse(text = makeNestedLoops(8)) ) > ## 229713 > > I hope I have missed a better way to do this in R. Otherwise, I > believe what I'm after is some kind of C or C++ macro expansion, > because the number of loops should not be hard coded. > > Thanks for any tips you may have! > > Best regards, > > baptiste > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT
> I hope I have missed a better way to do this in R. Otherwise, I > believe what I'm after is some kind of C or C++ macro expansion, > because the number of loops should not be hard coded.Why not generate the list of integers that sum to 17, and then mix with 0s as appropriate? Hadley -- http://had.co.nz/
This problem does yield some interesting and unexpected distributions.
Here is another, the number of positive cases as a function of number
of digits (8 in the original question) and of test value (17).
maxi <- 9
N <- 5
test <- 17
foo <- function(N=2, test=1){
sum(rowSums(do.call(expand.grid, c(list(1:maxi), rep(list(0:maxi),
N-1)))) == test)
}
library(plyr)
N <- seq(1, 6)
maxi <- 9*N
params <- data.frame(from=N, to=maxi+1)
params2 <- mlply(params, seq)
NN <- lapply(seq_along(params2), function(x) rep(x, length(params2[[x]])))
params3 <- data.frame(N=do.call(c, NN), test=do.call(c, params2))
results <- mdply(params3, foo)
library(ggplot2)
p <-
qplot(test, V1, data=results, geom="path")+
facet_wrap(~N, scales="free") +
scale_x_continuous(expand=c(0, 0))+
xlab("test") + ylab("number of match")
p
Best,
baptiste
[David: sorry for the duplicate, i initially sent an attachment that
was too large for the list]
> 2009/12/19 David Winsemius <dwinsemius at comcast.net>:
>>
>> On Dec 19, 2009, at 2:10 PM, David Winsemius wrote:
>>
>>>
>>> On Dec 19, 2009, at 1:36 PM, baptiste auguie wrote:
>>>
>>>> 2009/12/19 David Winsemius <dwinsemius at comcast.net>:
>>>>>
>>>>> On Dec 19, 2009, at 9:06 AM, baptiste auguie wrote:
>>>>>
>>>>>> Dear list,
>>>>>>
>>>>>> In a little numbers game, I've hit a performance
snag and I'm not sure
>>>>>> how to code this in C.
>>>>>>
>>>>>> The game is the following: how many 8-digit numbers
have the sum of
>>>>>> their digits equal to 17?
>>>>>
>>>>> And are you considering the number "00000089" to
be in the acceptable
>>>>> set?
>>>>> Or is the range of possible numbers in 10000079:98000000 ?
>>>>>
>>>>
>>>> The latter, the first digit should not be 0. But if you have an
>>>> interesting solution for the other case, let me know anyway.
>>>>
>>>> I should also stress that this is only for entertainment and
curiosity's
>>>> sake.
>>>
>>> The sequence of numbers whose digit sum is 13 is one of the ATT
integer
>>> sequences:
>>>
>>> http://www.research.att.com/~njas/sequences/A143164
>>>
>>> No R implementations there, only Maple and Mathematica. Rather than
doing
>>> strsplit()'s, I thought that a mathematical approach would be
faster for a
>>> sumdigits function:
>>>
>>> sumdigits <- function(x) { s=0; for (i in 1:(1+floor(log(x,
base=10))) ){
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?s<-s+x%%10; x<-x%/%10}
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? return(s) }
>>>
>>> # what follows would be expected to take roughly 1/100th ?and
1/50th of
>>> the time of a complete enumeration but is useful for estimating the
size of
>>> the result and the time of an exhaustive search:
>>>
>>> > system.time( for (i in 10000079:11000079) if
(sumdigits(i)==17)
>>> > {idx<-c(idx,i)})
>>> ?user ?system elapsed
>>> 30.997 ? 3.516 ?34.403
>>>
>>> > system.time( for (i in 10000079:12000079) if
(sumdigits(i)==17)
>>> > {idx<-c(idx,i)})
>>> ?user ?system elapsed
>>> 55.975 ? 2.433 ?58.218
>>>
>>> > head(idx)
>>> [1] 10000079 10000088 10000097 10000169 10000178 10000187
>>>
>>> > length(idx)
>>> [1] 31572
>>>
>>> So it looks as though an exhaustive strategy would take a bit under
an
>>> hour on my machine (a 2 year-old device) and be a vector around
1578600
>>> elements in length. (Takes very little memory, and would
undoubtedly be
>>> faster if I could use more than one core.)
>>
>> I was assuming that it would be a relatively constant density of
numbers in
>> that range which can be seen to be false by examining a density plot of
>> roughly the first 5% of that range.
>>
>>> system.time( ?for (i in 10000079:14000079) if (sumdigits(i)==17)
>>> {idx<-c(idx,i)})
>> ? user ?system elapsed
>> 113.074 ? 5.764 118.391
>>> plot(density(idx))
>>
>> Plot attached:
>>
>>
>>>
>>>>
>>>> baptiste
>>
>> David Winsemius, MD
>> Heritage Laboratories
>> West Hartford, CT
>>
>>
>>
>
Hi library(partitions) jj <- blockparts(rep(9,8),17) dim(jj) gives 318648 HTH rksh baptiste auguie wrote:> Dear list, > > In a little numbers game, I've hit a performance snag and I'm not sure > how to code this in C. > > The game is the following: how many 8-digit numbers have the sum of > their digits equal to 17? > The brute-force answer could be: > > maxi <- 9 # digits from 0 to 9 > N <- 5 # 8 is too large > test <- 17 # for example's sake > > sum(rowSums(do.call(expand.grid, c(list(1:maxi), rep(list(0:maxi), > N-1)))) == test) > ## 3675 > > Now, if I make N = 8, R stalls for some time and finally gives up with: > Error: cannot allocate vector of size 343.3 Mb > > I thought I could get around this using Reduce() to recursively apply > rowSum to intermediate results, but it doesn't seem to help, > > > a=list(1:maxi) > b=rep(list(0:maxi), N-1) > > foo <- function(a, b, fun="sum", ...){ > switch(fun, > 'sum' = rowSums(do.call(expand.grid, c(a, b))), > 'mean' = rowMeans(do.call(expand.grid, c(a, b))), > apply(do.call(expand.grid, c(a, b)), 1, fun, ...)) # generic case > } > > sum(Reduce(foo, list(b), init=a) == test) > ## 3675 # OK > > Same problem with N=8. > > Now, if N was fixed I could write a little C code to do this > calculation term-by-term, something along those lines, > > test = 0; > > for (i1=1, i1=9, i1++) { > for (i2=0, i2=9, i2++) { > > [... other nested loops ] > > test = test + (i1 + i2 + [...] == 17); > > } [...] > } > > but here the number of for loops, N, should remain a variable. > > In despair I coded this in R as a wicked eval(parse()) construct, and > it does produce the expected result after an awfully long time. > > makeNestedLoops <- function(N=3){ > > startLoop <- function(ii, start=1, end=9){ > paste("for (i", ii, " in seq(",start,", ",end,")) {\n", sep="") > } > > first <- startLoop(1) > inner.start <- lapply(seq(2, N), startLoop, start=0) > calculation <- paste("test <- test + (", paste("i", seq(1, N), > sep="", collapse="+"), "==17 )\n") > end <- replicate(N, "}\n") > code.to.run <- do.call(paste, c(list(first), inner.start, calculation, end)) > cat(code.to.run) > invisible(code.to.run) > } > > test <- 0 > eval(parse(text = makeNestedLoops(8)) ) > ## 229713 > > I hope I have missed a better way to do this in R. Otherwise, I > believe what I'm after is some kind of C or C++ macro expansion, > because the number of loops should not be hard coded. > > Thanks for any tips you may have! > > Best regards, > > baptiste > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Robin K. S. Hankin Uncertainty Analyst University of Cambridge 19 Silver Street Cambridge CB3 9EP 01223-764877