thr3ads.net - R help - [R] loop is going to take 26 hours

If this information is useful, please help other people find it:
Share via:

Jenny Barnes

2006-Dec-14 12:56 UTC

[R] loop is going to take 26 hours - needs to be quicker!

Dear R-help,

I have a loop, which is set to take about 26 hours to run at the rate it's
going
- this is ridiculous and I really need your help to find a more efficient way of
loading up my array gpcc.array:

#My data is stored in a table format with all the data in one long column 
#running though every longitute, for every latitude, for every year. The 
#original data is sotred as gpcc.data2 where dim(gpcc.data2) = [476928,5] where 
#the 5th column is the data:

#make the array in the format I need [longitude,latitude,years]

gpcc.array <- array(NA, c(144,72,46)) 

n=0
for(k in 1:46){
for(j in 1:72){
for(i in 1:144){
n <- n+1
gpcc.array[i,j,k] <- gpcc.data2[n,5]
print(j)
}
}
}

So it runs through all the longs for every lat for every year - which is the 
order the data is running down the column in gpcc.data2 so n increses by 1 each 
time and each data point is pulled off....

It needs to be a lot quicker, I'd appreciate any ideas!

Many thanks for taking time to read this,

Jenny Barnes

~~~~~~~~~~~~~~~~~~
Jennifer Barnes
PhD student - long range drought prediction
Climate Extremes
Department of Space and Climate Physics
University College London
Holmbury St Mary, Dorking
Surrey
RH5 6NT
01483 204149
07916 139187
Web: http://climate.mssl.ucl.ac.uk

Duncan Murdoch

2006-Dec-14 13:17 UTC

head link

[R] loop is going to take 26 hours - needs to be quicker!

On 12/14/2006 7:56 AM, Jenny Barnes wrote:> Dear R-help,
> 
> I have a loop, which is set to take about 26 hours to run at the rate
it's going
> - this is ridiculous and I really need your help to find a more efficient
way of
> loading up my array gpcc.array:
> 
> #My data is stored in a table format with all the data in one long column 
> #running though every longitute, for every latitude, for every year. The 
> #original data is sotred as gpcc.data2 where dim(gpcc.data2) = [476928,5]
where
> #the 5th column is the data:
> 
> #make the array in the format I need [longitude,latitude,years]
> 
> gpcc.array <- array(NA, c(144,72,46)) 
> 
> n=0
> for(k in 1:46){
> for(j in 1:72){
> for(i in 1:144){
> n <- n+1
> gpcc.array[i,j,k] <- gpcc.data2[n,5]
> print(j)
> }
> }
> }
> 
> So it runs through all the longs for every lat for every year - which is
the
> order the data is running down the column in gpcc.data2 so n increses by 1
each
> time and each data point is pulled off....
> 
> It needs to be a lot quicker, I'd appreciate any ideas!
I think the loop above is equivalent to

gpcc.array <- array(gpcc.data2[,5], c(144, 72, 46))

which would certainly be a lot quicker.  You should check that the 
values are loaded in the right order (probably on a smaller example!). 
If not, you should change the order of indices when you create the 
array, and use the aperm() function to get them the way you want afterwards.

Duncan Murdoch

Rainer M Krug

2006-Dec-14 13:24 UTC

head link

[R] loop is going to take 26 hours - needs to be quicker!

Jenny Barnes wrote:> Dear R-help,
> 
> I have a loop, which is set to take about 26 hours to run at the rate
it's going
> - this is ridiculous and I really need your help to find a more efficient
way of
> loading up my array gpcc.array:
> 
> #My data is stored in a table format with all the data in one long column 
> #running though every longitute, for every latitude, for every year. The 
> #original data is sotred as gpcc.data2 where dim(gpcc.data2) = [476928,5]
where
> #the 5th column is the data:
> 
> #make the array in the format I need [longitude,latitude,years]
> 
> gpcc.array <- array(NA, c(144,72,46)) 
> 
> n=0
> for(k in 1:46){
> for(j in 1:72){
> for(i in 1:144){
> n <- n+1
> gpcc.array[i,j,k] <- gpcc.data2[n,5]
> print(j)
> }
> }
> }
I don't know if it is faster - but adding three columns to qpcc.data, 
one for longitude, one for lattitude and one for year (using rep() as 
they are in sequence) and the using reshape() might be faster?

> 
> So it runs through all the longs for every lat for every year - which is
the
> order the data is running down the column in gpcc.data2 so n increses by 1
each
> time and each data point is pulled off....
> 
> It needs to be a lot quicker, I'd appreciate any ideas!
> 
> Many thanks for taking time to read this,
> 
> Jenny Barnes
> 
> ~~~~~~~~~~~~~~~~~~
> Jennifer Barnes
> PhD student - long range drought prediction
> Climate Extremes
> Department of Space and Climate Physics
> University College London
> Holmbury St Mary, Dorking
> Surrey
> RH5 6NT
> 01483 204149
> 07916 139187
> Web: http://climate.mssl.ucl.ac.uk
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation
Biology (UCT)

Department of Conservation Ecology and Entomology
University of Stellenbosch
Matieland 7602
South Africa

Tel:		+27 - (0)72 808 2975 (w)
Fax:		+27 - (0)86 516 2782
Fax:		+27 - (0)21 808 3304 (w)
Cell:		+27 - (0)83 9479 042

email:	RKrug at sun.ac.za
       	Rainer at krugs.de

Jenny Barnes

2006-Dec-14 13:28 UTC

head link

[R] loop is going to take 26 hours - needs to be quicker!

Dear R-help,

I forgot to mention that I need the array in that format because I am going to 
do the same thing for another dataset of precipitation (ncep.data2) so they are 
both arrays of dimensions [144,72,46] so that I can correlate them globally and 
plot a visual image of the global correlations between the 2 datasets.... One of
the datasets has a land mask applied to it already so it should be clear to see 
the land and pick ot the locations (i.e.over Europe) where there is strongest 
and weakest correlation.....that is the ultimate goal.

Following Rainer's response I should also point out that the columns in 
gpcc.data2 (with dimensions dim(gpcc.data2) = [476928,5]) are:

[,1]="Year", [,2]="month" (which is just january so always
1), [,3]="latitude",
[,4]="longitude" and [,5]="data". All I want in the
gpcc.array is the data not
the longitudes and latitude values...hope that helps clear it up a bit!

I look forward to hearing any more ideas, thanks again for your time in reading 
this,

Jenny Barnes
>
>Jenny Barnes wrote:
>> Dear R-help,
>> 
>> I have a loop, which is set to take about 26 hours to run at the rate
it's
going >> - this is ridiculous and I really need your help to find a more
efficient way
of >> loading up my array gpcc.array:
>> 
>> #My data is stored in a table format with all the data in one long
column
>> #running though every longitute, for every latitude, for every year.
The
>> #original data is sotred as gpcc.data2 where dim(gpcc.data2) =
[476928,5]
where >> #the 5th column is the data:
>> 
>> #make the array in the format I need [longitude,latitude,years]
>> 
>> gpcc.array <- array(NA, c(144,72,46)) 
>> 
>> n=0
>> for(k in 1:46){
>> for(j in 1:72){
>> for(i in 1:144){
>> n <- n+1
>> gpcc.array[i,j,k] <- gpcc.data2[n,5]
>> print(j)
>> }
>> }
>> }
>
>I don't know if it is faster - but adding three columns to qpcc.data, 
>one for longitude, one for lattitude and one for year (using rep() as 
>they are in sequence) and the using reshape() might be faster?
>
>
>> 
>> So it runs through all the longs for every lat for every year - which
is the
>> order the data is running down the column in gpcc.data2 so n increses
by 1
each >> time and each data point is pulled off....
>> 
>> It needs to be a lot quicker, I'd appreciate any ideas!
>> 
>> Many thanks for taking time to read this,
>> 
>> Jenny Barnes
>> 
>> ~~~~~~~~~~~~~~~~~~
>> Jennifer Barnes
>> PhD student - long range drought prediction
>> Climate Extremes
>> Department of Space and Climate Physics
>> University College London
>> Holmbury St Mary, Dorking
>> Surrey
>> RH5 6NT
>> 01483 204149
>> 07916 139187
>> Web: http://climate.mssl.ucl.ac.uk
>> 
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>-- 
>Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation
>Biology (UCT)
>
>Department of Conservation Ecology and Entomology
>University of Stellenbosch
>Matieland 7602
>South Africa
>
>Tel:		+27 - (0)72 808 2975 (w)
>Fax:		+27 - (0)86 516 2782
>Fax:		+27 - (0)21 808 3304 (w)
>Cell:		+27 - (0)83 9479 042
>
>email:	RKrug at sun.ac.za
>       	Rainer at krugs.de
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jennifer Barnes
PhD student - long range drought prediction
Climate Extremes
Department of Space and Climate Physics
University College London
Holmbury St Mary, Dorking
Surrey
RH5 6NT
01483 204149
07916 139187
Web: http://climate.mssl.ucl.ac.uk

Marc Schwartz

2006-Dec-14 13:33 UTC

head link

[R] loop is going to take 26 hours - needs to be quicker!

On Thu, 2006-12-14 at 12:56 +0000, Jenny Barnes wrote:> Dear R-help,
> 
> I have a loop, which is set to take about 26 hours to run at the rate
it's going
> - this is ridiculous and I really need your help to find a more efficient
way of
> loading up my array gpcc.array:
> 
> #My data is stored in a table format with all the data in one long column 
> #running though every longitute, for every latitude, for every year. The 
> #original data is sotred as gpcc.data2 where dim(gpcc.data2) = [476928,5]
where
> #the 5th column is the data:
> 
> #make the array in the format I need [longitude,latitude,years]
> 
> gpcc.array <- array(NA, c(144,72,46)) 
> 
> n=0
> for(k in 1:46){
> for(j in 1:72){
> for(i in 1:144){
> n <- n+1
> gpcc.array[i,j,k] <- gpcc.data2[n,5]
> print(j)
> }
> }
> }
> 
> So it runs through all the longs for every lat for every year - which is
the
> order the data is running down the column in gpcc.data2 so n increses by 1
each
> time and each data point is pulled off....
> 
> It needs to be a lot quicker, I'd appreciate any ideas!
> 
> Many thanks for taking time to read this,
> 
> Jenny Barnes
Take a "whole object" approach to this problem. You are also wasting a
lot of time by printing the values of 'j' in the loop.

> gpcc.data2 <- matrix(rnorm(476928 * 5), ncol = 5)
> dim(gpcc.data2)
[1] 476928      5> str(gpcc.data2) num [1:476928, 1:5]  2.7385 -0.0438 -0.1084  0.8768 -1.0024 ...

> system.time(gpcc.array <- array(gpcc.data2[, 5],                                   dim = c(144, 72, 46)))
[1] 0.024 0.026 0.078 0.000 0.000

You should verify the order of the values and adjust the indices
accordingly, if the above results in an out of order array.

HTH,

Marc Schwartz

Petr Pikal

2006-Dec-14 13:39 UTC

head link

[R] loop is going to take 26 hours - needs to be quicker!

Hi

if I understand correctly, you have one column you need to reformat 
into array. Array is basically a vector with dim attribute. Therefore 
if your data were properly sorted you could use just

gpcc.array <- array(gpccnata2[,5], c(144,72,46))

to reformat column 5 of your data frame. But you shall be 100% sure 
you really want an array and not any other data form.

HTH
Petr
 


On 14 Dec 2006 at 12:56, Jenny Barnes wrote:

Date sent:      	Thu, 14 Dec 2006 12:56:16 +0000 (GMT)
From:           	Jenny Barnes <jmb at mssl.ucl.ac.uk>
To:             	r-help at stat.math.ethz.ch
Subject:        	[R] loop is going to take 26 hours - needs to be quicker!
Send reply to:  	Jenny Barnes <jmb at mssl.ucl.ac.uk>
	<mailto:r-help-request at stat.math.ethz.ch?subject=unsubscribe>
	<mailto:r-help-request at stat.math.ethz.ch?subject=subscribe>
> Dear R-help,
> 
> I have a loop, which is set to take about 26 hours to run at the rate
> it's going - this is ridiculous and I really need your help to find a
> more efficient way of loading up my array gpcc.array:
> 
> #My data is stored in a table format with all the data in one long
> #column running though every longitute, for every latitude, for every
> #year. The original data is sotred as gpcc.data2 where dim(gpcc.data2)
> #= [476928,5] where the 5th column is the data:
> 
> #make the array in the format I need [longitude,latitude,years]
> 
> gpcc.array <- array(NA, c(144,72,46)) 
> 
> n=0
> for(k in 1:46){
> for(j in 1:72){
> for(i in 1:144){
> n <- n+1
> gpcc.array[i,j,k] <- gpcc.data2[n,5]
> print(j)
> }
> }
> }
> 
> So it runs through all the longs for every lat for every year - which
> is the order the data is running down the column in gpcc.data2 so n
> increses by 1 each time and each data point is pulled off....
> 
> It needs to be a lot quicker, I'd appreciate any ideas!
> 
> Many thanks for taking time to read this,
> 
> Jenny Barnes
> 
> ~~~~~~~~~~~~~~~~~~
> Jennifer Barnes
> PhD student - long range drought prediction
> Climate Extremes
> Department of Space and Climate Physics
> University College London
> Holmbury St Mary, Dorking
> Surrey
> RH5 6NT
> 01483 204149
> 07916 139187
> Web: http://climate.mssl.ucl.ac.uk
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.
Petr Pikal
petr.pikal at precheza.cz

Jenny Barnes

2006-Dec-14 13:41 UTC

head link

[R] loop is going to take 26 hours - needs to be quicker!

Dear R-help,

Thank you for the responses off everyone- you'll be please to hear Duncan
that
using: > gpcc.array <- array(gpcc.data2[,5], c(144, 72, 46))was spot-on, worked like a dream. The data is in the correct places as I checked
with the text file. It took literally 2 seconds - quite an improvement time on 
the predicted 26 hours :-)

I really really appreciate your help, you're all very very kind people.

Merry Christmas,

Jenny Barnes


>Date: Thu, 14 Dec 2006 08:17:24 -0500
>From: Duncan Murdoch <murdoch at stats.uwo.ca>
>User-Agent: Thunderbird 1.5.0.8 (Windows/20061025)
>MIME-Version: 1.0
>To: Jenny Barnes <jmb at mssl.ucl.ac.uk>
>CC: r-help at stat.math.ethz.ch
>Subject: Re: [R] loop is going to take 26 hours - needs to be quicker!
>Content-Transfer-Encoding: 7bit
>X-MSSL-MailScanner-Information: Please contact the ISP for more information
>X-MSSL-MailScanner: No virus found
>X-MSSL-MailScanner-SpamCheck: not spam, SpamAssassin (score=-4.9, required
5,
BAYES_00 -4.90)>
>On 12/14/2006 7:56 AM, Jenny Barnes wrote:
>> Dear R-help,
>> 
>> I have a loop, which is set to take about 26 hours to run at the rate
it's
going >> - this is ridiculous and I really need your help to find a more
efficient way
of >> loading up my array gpcc.array:
>> 
>> #My data is stored in a table format with all the data in one long
column
>> #running though every longitute, for every latitude, for every year.
The
>> #original data is sotred as gpcc.data2 where dim(gpcc.data2) =
[476928,5]
where >> #the 5th column is the data:
>> 
>> #make the array in the format I need [longitude,latitude,years]
>> 
>> gpcc.array <- array(NA, c(144,72,46)) 
>> 
>> n=0
>> for(k in 1:46){
>> for(j in 1:72){
>> for(i in 1:144){
>> n <- n+1
>> gpcc.array[i,j,k] <- gpcc.data2[n,5]
>> print(j)
>> }
>> }
>> }
>> 
>> So it runs through all the longs for every lat for every year - which
is the
>> order the data is running down the column in gpcc.data2 so n increses
by 1
each >> time and each data point is pulled off....
>> 
>> It needs to be a lot quicker, I'd appreciate any ideas!
>
>I think the loop above is equivalent to
>
>gpcc.array <- array(gpcc.data2[,5], c(144, 72, 46))
>
>which would certainly be a lot quicker.  You should check that the 
>values are loaded in the right order (probably on a smaller example!). 
>If not, you should change the order of indices when you create the 
>array, and use the aperm() function to get them the way you want afterwards.
>
>Duncan Murdoch
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jennifer Barnes
PhD student - long range drought prediction
Climate Extremes
Department of Space and Climate Physics
University College London
Holmbury St Mary, Dorking
Surrey
RH5 6NT
01483 204149
07916 139187
Web: http://climate.mssl.ucl.ac.uk

Jenny Barnes

2006-Dec-14 15:48 UTC

head link

[R] loop is going to take 26 hours - needs to be quicker!

Dear Patrick,

Thank you for the link - I'd advise anyone who's started using R to have
a look
at these as well - any help is always appreciated. I've downloaded the S
Poetry
and will hit the books tomorrow and get reading it!

Jenny

>
>S Poetry may be of use to you -- especially the chapter
>on arrays which discusses 3 dimensional arrays in particular.
>
>Patrick Burns
>patrick at burns-stat.com
>+44 (0)20 8525 0696
>http://www.burns-stat.com
>(home of S Poetry and "A Guide for the Unwilling S User")
>
>Jenny Barnes wrote:
>
>>Dear R-help,
>>
>>I forgot to mention that I need the array in that format because I am
going to
>>do the same thing for another dataset of precipitation (ncep.data2) so
they
are >>both arrays of dimensions [144,72,46] so that I can correlate them
globally
and >>plot a visual image of the global correlations between the 2
datasets.... One
of >>the datasets has a land mask applied to it already so it should be clear
to
see >>the land and pick ot the locations (i.e.over Europe) where there is
strongest
>>and weakest correlation.....that is the ultimate goal.
>>
>>Following Rainer's response I should also point out that the columns
in
>>gpcc.data2 (with dimensions dim(gpcc.data2) = [476928,5]) are:
>>
>>[,1]="Year", [,2]="month" (which is just january so
always 1),
[,3]="latitude", >>[,4]="longitude" and [,5]="data". All I want in the
gpcc.array is the data not
>>the longitudes and latitude values...hope that helps clear it up a bit!
>>
>>I look forward to hearing any more ideas, thanks again for your time in 
reading >>this,
>>
>>Jenny Barnes
>>
>>  
>>
>>>Jenny Barnes wrote:
>>>    
>>>
>>>>Dear R-help,
>>>>
>>>>I have a loop, which is set to take about 26 hours to run at the
rate it's
>>>>      
>>>>
>>going 
>>  
>>
>>>>- this is ridiculous and I really need your help to find a more
efficient
way >>>>      
>>>>
>>of 
>>  
>>
>>>>loading up my array gpcc.array:
>>>>
>>>>#My data is stored in a table format with all the data in one
long column
>>>>#running though every longitute, for every latitude, for every
year. The
>>>>#original data is sotred as gpcc.data2 where dim(gpcc.data2) =
[476928,5]
>>>>      
>>>>
>>where 
>>  
>>
>>>>#the 5th column is the data:
>>>>
>>>>#make the array in the format I need [longitude,latitude,years]
>>>>
>>>>gpcc.array <- array(NA, c(144,72,46)) 
>>>>
>>>>n=0
>>>>for(k in 1:46){
>>>>for(j in 1:72){
>>>>for(i in 1:144){
>>>>n <- n+1
>>>>gpcc.array[i,j,k] <- gpcc.data2[n,5]
>>>>print(j)
>>>>}
>>>>}
>>>>}
>>>>      
>>>>
>>>I don't know if it is faster - but adding three columns to
qpcc.data,
>>>one for longitude, one for lattitude and one for year (using rep()
as
>>>they are in sequence) and the using reshape() might be faster?
>>>
>>>
>>>    
>>>
>>>>So it runs through all the longs for every lat for every year -
which is the
>>>>order the data is running down the column in gpcc.data2 so n
increses by 1
>>>>      
>>>>
>>each 
>>  
>>
>>>>time and each data point is pulled off....
>>>>
>>>>It needs to be a lot quicker, I'd appreciate any ideas!
>>>>
>>>>Many thanks for taking time to read this,
>>>>
>>>>Jenny Barnes
>>>>
>>>>~~~~~~~~~~~~~~~~~~
>>>>Jennifer Barnes
>>>>PhD student - long range drought prediction
>>>>Climate Extremes
>>>>Department of Space and Climate Physics
>>>>University College London
>>>>Holmbury St Mary, Dorking
>>>>Surrey
>>>>RH5 6NT
>>>>01483 204149
>>>>07916 139187
>>>>Web: http://climate.mssl.ucl.ac.uk
>>>>
>>>>______________________________________________
>>>>R-help at stat.math.ethz.ch mailing list
>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>>and provide commented, minimal, self-contained, reproducible
code.
>>>>      
>>>>
>>>-- 
>>>Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation
>>>Biology (UCT)
>>>
>>>Department of Conservation Ecology and Entomology
>>>University of Stellenbosch
>>>Matieland 7602
>>>South Africa
>>>
>>>Tel:		+27 - (0)72 808 2975 (w)
>>>Fax:		+27 - (0)86 516 2782
>>>Fax:		+27 - (0)21 808 3304 (w)
>>>Cell:		+27 - (0)83 9479 042
>>>
>>>email:	RKrug at sun.ac.za
>>>      	Rainer at krugs.de
>>>    
>>>
>>
>>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>Jennifer Barnes
>>PhD student - long range drought prediction
>>Climate Extremes
>>Department of Space and Climate Physics
>>University College London
>>Holmbury St Mary, Dorking
>>Surrey
>>RH5 6NT
>>01483 204149
>>07916 139187
>>Web: http://climate.mssl.ucl.ac.uk
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>  
>>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jennifer Barnes
PhD student - long range drought prediction
Climate Extremes
Department of Space and Climate Physics
University College London
Holmbury St Mary, Dorking
Surrey
RH5 6NT
01483 204149
07916 139187
Web: http://climate.mssl.ucl.ac.uk

Apparently Analagous Threads

Search for more maybe matching threads

R help - Dec 2006 - loop is going to take 26 hours - needs to be quicker!

[R] loop is going to take 26 hours - needs to be quicker!

[R] loop is going to take 26 hours - needs to be quicker!

[R] loop is going to take 26 hours - needs to be quicker!

[R] loop is going to take 26 hours - needs to be quicker!

[R] loop is going to take 26 hours - needs to be quicker!

[R] loop is going to take 26 hours - needs to be quicker!

[R] loop is going to take 26 hours - needs to be quicker!

[R] loop is going to take 26 hours - needs to be quicker!

Apparently Analagous Threads