I guess this has been discussed before, but I don't know the name of this problem, thus had to ask again. Consider this scenario:> fun <- function(x) { print(x)} > for (i in Vectorize(fun, "x")(1:3)) print("OK")[1] 1 [1] 2 [1] 3 [1] "OK" [1] "OK" [1] "OK" The optimal behaviour is:> fun <- function(x) { print(x)} > for (i in Vectorize(fun, "x")(1:3)) print("OK")[1] 1 [1] "OK" [1] 2 [1] "OK" [1] 3 [1] "OK" That is, each iteration of vectorized function should yield some result for the 'for' statement, rather than having all results collected beforehand. The intention of such a pattern, is to separates the data generation logic from data processing logic. The latter mechanism, I think, is more efficient because it doesn't cache all data before processing -- and the interpreter has the sure knowledge that caching is not needed, since the vectorized function is not used in assignment but as a range. The difference may be trivial, but this pseud code demonstrates otherwise: readSample <- function(x) { .... sampling_time <- readBin(con, integer(), 1, size=4) sample_count <- readBin(con, integer(), 1, size=2) samples <- readBin(con, float(), sample_count, size=4) .... matrix # return a big matrix representing a sample } for (sample in Vectorize(readSample, "x")(1:10000)) { # process sample } The data file is a few Gigabytes, and caching them is not effortless. Not having to cache them would make a difference. This email asks to 1. validate this need of the langauge; 2. alternative design pattern to workaround it; 3. Ask the proper place to discuss this. Thanks and best...
Jeff Newmiller
2013-Aug-01 18:04 UTC
[R] use Vectorized function as range of for statement
I think this is on topic here, but a reproducible example is highly desirable if not required for clarity. The Vectorize function is essentially a wrapped up for loop, so you are really executing two successive for loops. Note that the Vectorize function is not itself vectorised, so there is no particular advantage to using it in this way. You might as well call fun as a statement in the for loop. However, interleaving output and computation is quite inefficient, so it it strongly recommended to handle output in its own loop or function in most cases. This allows true vectorization to be applied to the computation phase. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. Zhang Weiwu <zhangweiwu at realss.com> wrote:> >I guess this has been discussed before, but I don't know the name of >this >problem, thus had to ask again. > >Consider this scenario: > >> fun <- function(x) { print(x)} >> for (i in Vectorize(fun, "x")(1:3)) print("OK") >[1] 1 >[1] 2 >[1] 3 >[1] "OK" >[1] "OK" >[1] "OK" > >The optimal behaviour is: > >> fun <- function(x) { print(x)} >> for (i in Vectorize(fun, "x")(1:3)) print("OK") >[1] 1 >[1] "OK" >[1] 2 >[1] "OK" >[1] 3 >[1] "OK" > >That is, each iteration of vectorized function should yield some result >for >the 'for' statement, rather than having all results collected >beforehand. > >The intention of such a pattern, is to separates the data generation >logic >from data processing logic. > >The latter mechanism, I think, is more efficient because it doesn't >cache >all data before processing -- and the interpreter has the sure >knowledge >that caching is not needed, since the vectorized function is not used >in >assignment but as a range. > >The difference may be trivial, but this pseud code demonstrates >otherwise: > >readSample <- function(x) { > .... > sampling_time <- readBin(con, integer(), 1, size=4) > sample_count <- readBin(con, integer(), 1, size=2) > samples <- readBin(con, float(), sample_count, size=4) > .... > matrix # return a big matrix representing a sample >} > >for (sample in Vectorize(readSample, "x")(1:10000)) { > # process sample >} > >The data file is a few Gigabytes, and caching them is not effortless. >Not >having to cache them would make a difference. > >This email asks to 1. validate this need of the langauge; 2. >alternative >design pattern to workaround it; 3. Ask the proper place to discuss >this. > >Thanks and best... > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.