thr3ads.net - R help - [R] different way for a for loop for several columns? [Feb 2012]

If this information is useful, please help other people find it:
Share via:

Nerak

2012-Feb-13 23:20 UTC

[R] different way for a for loop for several columns?

Hi,
I have a question about repeating something for several columns.  I
calculated a lot of values for different objects. The values are normally
calculated step-wise for every day for each object. With this obtained data
frame, I make further calculations. Here I have to calculate some things and
I have to take into account the years because I want a number for each year.
I do these first for one object. I use a for loop to take into account the
years. Then, I want to do this as well for the other objects and I use again
another loop around it. The script works well, but it takes a lot of time.
Because of that, I want to see if there is another way to do this.
I made a reproducible script that has some characteristics of my script, but
that is simplified a lot. Instead of daily values, the reproducible script
has only 4 datapoints a year and instead if more than 30 years, here are
only 4 years used. As well, instead of more than 100 objects, the example
has only 3 (C,B,F).  Normally I don?t obtain the dataframe test in this way,
it?s just to create a reproducible script. Normally, the script between the
for (y in 1980:1983) { } loop is also much more extensive, ? 
[which(Year$Date== y)]    ? is used several  times an some small loops are
included. 

Reproducible script:
Date<-c(1980,1980,1980,1980,1981,1981,1981,1981,1982,1982,1982,1982,1983,1983,1983,1983)
C<-c(0,0,0,0,5,2,0,0,0,15,12,10,6,0,0,0)
B<-c(0,0,0,0,9,6,2,0,0,24,20,16,2,0,0,0)
F<-c(0,0,0,0,6,5,1,0,0,18,16,12,10,5,1,0)


test<-data.frame(C,B,F)
Year<-data.frame(Date)
test.1<-data.frame(c(1980:1983))
test.4<-data.frame(c(1:4))

for (l in 1:3)
{
	for (y in 1980:1983)
	{
	test.2<-ifelse(test[,l]>1,1,0)
	test.3<-test.2[which(Year$Date== y)]
	test.4$length[y-Year[1,]+1]<-length(which(test.3>0))
	}
test.1<-cbind(test.1, test.4$length)
}
names(test.1)<-c("year","C","B","F")

test.1


You can see that it will take a lot of time for more objects and years. A
problem is that the for (y in 1980:1983) { } takes a lot of time because ?
[which(Year$Date== y)] ? is used several times and it takes a lot of time to
search through all the rows. And then, all of this has to be repeated
several times for the different objects.
But actually, it are totally the same calculations that have to be made for
all the objects. Only the input data are different. (calculations are made
with the values of a corresponding columns of data frame test). I thought it
could be faster to calculate each step of the inner loop (for (y in
1980:1983) at the same time for each object . So for example: now,
test.2<-ifelse(test[,l]>1,1,0) is first calculated for year 1980 for
object
1, than for year 1981 for object 1 and so on for all the years, this is all
repeated for the different object. I?m looking for a way to calculate
test.2<-ifelse(test[,l]>1,1,0) first for all the objects for year 1980,
ten
for all the objects for year 1981 and so on. 

Does somebody knows a way to do this? I was thinking about some kind of form
of apply, but it?s not that I have created a function that has to be applied
on a whole column, calculations are done for the different rows?

Many thanks for your help!
Kind regards,
Nerak


--
View this message in context:
http://r.789695.n4.nabble.com/different-way-for-a-for-loop-for-several-columns-tp4385705p4385705.html
Sent from the R help mailing list archive at Nabble.com.

Rui Barradas

2012-Feb-14 02:07 UTC

head link

[R] different way for a for loop for several columns?

Hello,
> 
> I use a for loop to take into account the years.
> Then, I want to do this as well for the other objects and I use again
> another loop around it.
> The script works well, but it takes a lot of time. 
> 
Look at the first line in the inner loop:
> for (l in 1:3)
> {
>         for (y in 1980:1983)
>         {
>         test.2<-ifelse(test[,l]>1,1,0) 
>  [... etc ...]
It doesn't depend on 'y', could be put outside that loop. This alone
would
save time.
What also saves time is to see that you're choosing values 1 and 0 depending
on the 'ifelse' condition.
To simply do

test.2 <- test[, l] > 1    # Use TRUE = 1 and FALSE = 0

makes it 20 times faster (!) (Tested with a much larger 'test' 
data.frame.)
>
> Does somebody knows a way to do this? I was thinking about some kind of
> form of apply, ...
With no loops, only *apply:

# After creating the data frames save a copy of the original 'test.1'
# (This line should be before the loops)
test.1b <- test.1

# Now, after the loops and after attributing names to the result, try the
following.

# Could this become 'unique(Year$Date)' ?
y <- 1980:1983

# This is a matrix, not vector by vector like above. It's the matrix
'xx' in
the nested 'apply'
test.2b <- test > 1

# This is the index 'jj' into 'xx'
test.3b <- sapply(y, function(yy) which(Year$Date == yy))

Length <- t(apply(test.3b, 2, function(jj)
			apply(test.2b, 2, function(xx) sum(xx[jj]))))

# Maybe we don't need to save 'test.1b', if it's possible to
cbind(y,
Length)
test.1b <- cbind(test.1b, Length)

names(test.1b) <- c("year", "C", "B",
"F")

all.equal(test.1, test.1b)

I bet this is faster.

A final note, it's not a very good idea to use R's objects as variables
names .
For instance, 'length'. Prefer 'Length', it's not a
object/function.

Hope this helps,

Rui Barradas



--
View this message in context:
http://r.789695.n4.nabble.com/different-way-for-a-for-loop-for-several-columns-tp4385705p4385992.html
Sent from the R help mailing list archive at Nabble.com.

ilai

2012-Feb-14 17:50 UTC

head link

[R] different way for a for loop for several columns?

Inline

On Tue, Feb 14, 2012 at 3:16 AM, Nerak T <nerak.t at hotmail.com>
wrote:> Dear Ilai,
>
>
>
> Thanks for your answer. I'm indeed kind of a beginner in R, starting
> to?discover the endless possibilities in R. My goal?for the moment is
indeed
> to get rid of the use of loops and to see through the apply family (which
> seems to be an endless maze for the moment). (For some reason, the apply
> function doesn?t seems to be logical for my brains which prefer to think in
> a loop way)
>
?apply ?lapply ?tapply, etc. are just wrappers for building more
efficient loops. If you "think in loops" (which you shouldn't) you
are
also thinking in "apply". The reason it may seem like an endless maze
is because you use different wrappers for looping over different
object classes and indices, but at the end, the call to apply() is
similar to calling for().
e.g. consider a matrix with dimensions n x p. To sum rows you could
for(i in 1:n) sum(matrix[i,])
But better apply(matrix, 1 , sum)  # where the 1 denotes the 1st
dimension (rows)
Same thing for  sum columns
for(j in 1:p) sum(matrix[,j])
But better apply(matrix, 2 , sum)  # where the 2 denotes the 2nd
dimension (columns)

The same "stuff" that goes in the loop can go to apply with small
syntax changes. e.g.

ll<- list(1:5,1:10,letters)
out<- list()
for(L in 1:3){
# ...
# ... a bunch of complicated functions/calculations
# ...
out[[ L ]] <- length( ll[[ L ]] )
}
out

Can be replaced with

lapply( ll, function(L) {
# ...
# ... a bunch of complicated function/calculations
# ...
length(L)
} )

This time use lapply since you are looping over L elements of the list
ll.>
>
> Your answer is really helpful.
> Something I found really interesting is your general comment. You say that
I
> don?t need to declare variables? The reason I started to do this is because
> if I don?t, I get a message that the object is not found.
???

xx<- c(0,10,100)   # declare xx
print(xx)
rnorm(3,xx)          # use it
rm(xx)                 # remove xx
rnorm(3, xx<- c(0,10,100))   define and use it "at the same time"
print(xx)
> If I create a data
> frame before the calculation with the right amount of rows, it seems to
> work. But if there is a way not to have to make them before, would be
great?
Only in loops. What happens is you need to create a "storage" for the
result of your loop, since the objects created in the loop are
overwritten at each step:
for(i in 1:10) cat( i, '\n' )
i     # i =  10 everything before was overwritten
> Because most of the time, I don?t need that column that I created (but
> couldn't create an empty data frame with right dimensions to solve the
> problem)?
See my lapply example for creating "empty" storage.
>
>> Last, in R you want to avoid loops as much as possible especially for
>> large data sets. Operations are performed on objects so 1:4 + 1 is
>> equivalent to for(i in 1:4) i+1
> This part I don?t understand? where do you put that ? 1:4 + 1 ? ?
>
You don't put it anywhere, it was in answer to your comment: " but
it?s not that I have created a function that has to be applied on a
whole column, calculations are done for the different rows?"

So, no! calculations are done on the object (which has some dimension
or is a list), only in rare cases do you need to loop over each
element (or dimension) of the object itself.
>
> Many thanks, I?m trying to learn as much as possible to be able to use R
> more efficient so I really appreciate your help.
Pleasure. Good luck !
>
>
>
> Kind regards,
>
> Nerak
>
>
>
>
>
>
>
>
>
>> Date: Mon, 13 Feb 2012 23:43:29 -0700
>> Subject: Re: [R] different way for a for loop for several columns?
>> From: keren at math.montana.edu
>> To: nerak.t at hotmail.com
>
>>
>> Nerak,
>> Your example could have been done without a loop at all (at least this
>> calculation), or as you already know by calling one of the apply
>> family functions which are more efficient (but are still
"loops"):
>>
>> test<- data.frame(
>>
>>
Date=c(1980,1980,1980,1980,1981,1981,1981,1981,1982,1982,1982,1982,1983,1983,1983,1983),
>> C = c(0,0,0,0,5,2,0,0,0,15,12,10,6,0,0,0),
>> B = c(0,0,0,0,9,6,2,0,0,24,20,16,2,0,0,0),
>> F = c(0,0,0,0,6,5,1,0,0,18,16,12,10,5,1,0)
>> )
>> test.2 <- test[,-1] > 1
>> aggregate(test.2, list(test$Date), sum)
>>
>> # See ?aggregate for more details. it also has a time series method
>> which may be useful for you.
>>
>> A general comment. if you are or will be using R a bit more, it may
>> benefit you to study the manuals or find a good basic tutorial. You
>> seem to be applying the conventions of some other programming language
>> and that's slowing you down. e.g. you don't need to declare
variables,
>> so all this stuff before your loop is unnecessary:
>> > Year<-data.frame(Date)
>> > test.1<-data.frame(c(1980:1983))
>> > test.4<-data.frame(c(1:4))
>> Also 1:4 is equivalent to data.frame(c(1:4)) without the extra
attributes.
>>
>> Last, in R you want to avoid loops as much as possible especially for
>> large data sets. Operations are performed on objects so 1:4 + 1 is
>> equivalent to for(i in 1:4) i+1
>> Bottom line, the inner loops, calls to which, all that stuff...
>>
>> Hope that helps.
>>
>> > ? ? ? ?test.3<-test.2[which(Year$Date== y)]
>> > ? ? ? ?test.4$length[y-Year[1,]+1]<-length(which(test.3>0))
>> > ? ? ? ?}
>> > test.1<-cbind(test.1, test.4$length)
>> > }
>> >
names(test.1)<-c("year","C","B","F")
>> >
>> > test.1
>> >
>> >
>> > You can see that it will take a lot of time for more objects and
years.
>> > A
>> > problem is that the for (y in 1980:1983) { } takes a lot of time
because
>> > ?
>> > [which(Year$Date== y)] ? is used several times and it takes a lot
of
>> > time to
>> > search through all the rows. And then, all of this has to be
repeated
>> > several times for the different objects.
>> > But actually, it are totally the same calculations that have to be
made
>> > for
>> > all the objects. Only the input data are different. (calculations
are
>> > made
>> > with the values of a corresponding columns of data frame test). I
>> > thought it
>> > could be faster to calculate each step of the inner loop (for (y
in
>> > 1980:1983) at the same time for each object . So for example: now,
>> > test.2<-ifelse(test[,l]>1,1,0) is first calculated for year
1980 for
>> > object
>> > 1, than for year 1981 for object 1 and so on for all the years,
this is
>> > all
>> > repeated for the different object. I?m looking for a way to
calculate
>> > test.2<-ifelse(test[,l]>1,1,0) first for all the objects for
year 1980,
>> > ten
>> > for all the objects for year 1981 and so on.
>> >
>> > Does somebody knows a way to do this? I was thinking about some
kind of
>> > form
>> > of apply, but it?s not that I have created a function that has to
be
>> > applied
>> > on a whole column, calculations are done for the different rows?
>> >
>> > Many thanks for your help!
>> > Kind regards,
>> > Nerak
>> >
>> >
>> > --
>> > View this message in context:
>> >
http://r.789695.n4.nabble.com/different-way-for-a-for-loop-for-several-columns-tp4385705p4385705.html
>> > Sent from the R help mailing list archive at Nabble.com.
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Feb 2012 - different way for a for loop for several columns?

[R] different way for a for loop for several columns?

[R] different way for a for loop for several columns?

[R] different way for a for loop for several columns?

Seemingly Similar Threads