thr3ads.net - R help - [R] use Vectorized function as range of for statement [Aug 2013]

If this information is useful, please help other people find it:
Share via:

Zhang Weiwu

2013-Aug-01 16:38 UTC

[R] use Vectorized function as range of for statement

I guess this has been discussed before, but I don't know the name of this 
problem, thus had to ask again.

Consider this scenario:
> fun <- function(x) { print(x)}
> for (i in Vectorize(fun, "x")(1:3)) print("OK")[1] 1
[1] 2
[1] 3
[1] "OK"
[1] "OK"
[1] "OK"

The optimal behaviour is:
> fun <- function(x) { print(x)}
> for (i in Vectorize(fun, "x")(1:3)) print("OK")[1] 1
[1] "OK"
[1] 2
[1] "OK"
[1] 3
[1] "OK"

That is, each iteration of vectorized function should yield some result for 
the 'for' statement, rather than having all results collected
beforehand.

The intention of such a pattern, is to separates the data generation logic 
from data processing logic.

The latter mechanism, I think, is more efficient because it doesn't cache 
all data before processing -- and the interpreter has the sure knowledge 
that caching is not needed, since the vectorized function is not used in 
assignment but as a range.

The difference may be trivial, but this pseud code demonstrates otherwise:

readSample <- function(x) {
 	....
 	sampling_time <- readBin(con, integer(), 1, size=4)
 	sample_count <- readBin(con, integer(), 1, size=2)
 	samples <- readBin(con, float(), sample_count, size=4)
 	....
 	matrix # return a big matrix representing a sample
}

for (sample in Vectorize(readSample, "x")(1:10000)) {
 	# process sample
}

The data file is a few Gigabytes, and caching them is not effortless. Not 
having to cache them would make a difference.

This email asks to 1. validate this need of the langauge; 2. alternative 
design pattern to workaround it; 3. Ask the proper place to discuss this.

Thanks and best...

Jeff Newmiller

2013-Aug-01 18:04 UTC

head link

[R] use Vectorized function as range of for statement

I think this is on topic here, but a reproducible example is highly desirable if
not required for clarity.

The Vectorize function is essentially a wrapped up for loop, so you are really
executing two successive for loops. Note that the Vectorize function is not
itself vectorised, so there is no particular advantage to using it in this way.
You might as well call fun as a statement in the for loop.

However, interleaving output and computation is quite inefficient, so it it
strongly recommended to handle output in its own loop or function in most cases.
This allows true vectorization to be applied to the computation phase.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

Zhang Weiwu <zhangweiwu at realss.com> wrote:>
>I guess this has been discussed before, but I don't know the name of
>this 
>problem, thus had to ask again.
>
>Consider this scenario:
>
>> fun <- function(x) { print(x)}
>> for (i in Vectorize(fun, "x")(1:3)) print("OK")
>[1] 1
>[1] 2
>[1] 3
>[1] "OK"
>[1] "OK"
>[1] "OK"
>
>The optimal behaviour is:
>
>> fun <- function(x) { print(x)}
>> for (i in Vectorize(fun, "x")(1:3)) print("OK")
>[1] 1
>[1] "OK"
>[1] 2
>[1] "OK"
>[1] 3
>[1] "OK"
>
>That is, each iteration of vectorized function should yield some result
>for 
>the 'for' statement, rather than having all results collected
>beforehand.
>
>The intention of such a pattern, is to separates the data generation
>logic 
>from data processing logic.
>
>The latter mechanism, I think, is more efficient because it doesn't
>cache 
>all data before processing -- and the interpreter has the sure
>knowledge 
>that caching is not needed, since the vectorized function is not used
>in 
>assignment but as a range.
>
>The difference may be trivial, but this pseud code demonstrates
>otherwise:
>
>readSample <- function(x) {
> 	....
> 	sampling_time <- readBin(con, integer(), 1, size=4)
> 	sample_count <- readBin(con, integer(), 1, size=2)
> 	samples <- readBin(con, float(), sample_count, size=4)
> 	....
> 	matrix # return a big matrix representing a sample
>}
>
>for (sample in Vectorize(readSample, "x")(1:10000)) {
> 	# process sample
>}
>
>The data file is a few Gigabytes, and caching them is not effortless.
>Not 
>having to cache them would make a difference.
>
>This email asks to 1. validate this need of the langauge; 2.
>alternative 
>design pattern to workaround it; 3. Ask the proper place to discuss
>this.
>
>Thanks and best...
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

R help - Aug 2013 - use Vectorized function as range of for statement

[R] use Vectorized function as range of for statement

[R] use Vectorized function as range of for statement