Liaw, Andy wrote:> A colleague and I were trying to understand all the possible things one
> can do with for loops in R, and found some surprises. I think we've
> done sufficient detective work to have a good guess as to what's going
> on "underneath", but it would be nice to get some confirmation,
and
> better yet, perhaps documentation in the R-lang manual. Basically, the
> question is, how/what does R do with the loop index variable? Below are
> some examples:
>
I think it is documented in the ?Control topic that a copy of the seq
argument (the 1:2 in your first example) is made at the beginning, and that
altering var (your i) doesn't affect the loop. One other thing you
didn't investigate is what is the value of an expression like
loopval <- for (i in 1:2) { i }
This sets loopval to 2, but in R-devel (2.10.0 to be) this has changed:
loops now have NULL as their value.
> R> for (i in 1:2) { i <- 17; print(i) }
> [1] 17
> [1] 17
> R> print(i)
> [1] 17
> R> x <- 1:2
> R> for (i in x) { print(i); rm(i) }
> [1] 1
> [1] 2
> R> i
> Error: object 'i' not found
> R> for (i in x) { print(i); rm(x) }
> [1] 1
> [1] 2
> Warning message:
> In rm(x) : object 'x' not found
> R> i
> [1] 2
> R> x <- 1:2
> R> for (i in x) { print(i); i <- 17; print(i) }
> [1] 1
> [1] 17
> [1] 2
> [1] 17
>
> The guess is that at the beginning for the loop, R makes a copy of the
> object that's being looped over ("x" in examples above)
somewhere "under
> cover", and at the beginning of each iteration, assign the
"current"
> element to the index variable ("I" in the examples above). This
is the
> only logical explanation I can come up with given the behavior observed
> above. Can anyone confirm/deny this? If this is true, one thing to
> consider is not to use a large object to loop over (e.g., columns of a
> very large data frame).
>
It is uncommon to modify seq (your x) in the loop. In the usual case
where you don't modify it, the fact that the loop has made a copy should
not matter: R won't actually copy the complete object until one version
of it is changed.
So this sequence
seq <- data.frame(a=1:1000000, b=1:1000000)
for (var in seq) { print(var[1]) }
hardly uses any more memory during the loop than it used in creating
seq, but this sequence
for (var in seq) { seq$b[1] <- -1; print(var[1]) }
uses a lot more: seq is modified so a copy is made, and seq$b is
modified after var is set to it, so a copy is made of that too. Both of
the loops print two 1's, by the way.
Duncan Murdoch> Andy