thr3ads.net - R help - [R] Doing a Task Without Using a For Loop [Oct 2008]

If this information is useful, please help other people find it:
Share via:

Tom La Bone

2008-Oct-14 13:58 UTC

[R] Doing a Task Without Using a For Loop

Assume that I have the dataframe "data1", which is listed at the end
of this
message. I want count the number of lines that each person has for each
year. For example, the person with ID=213 has 15 entries (NinYear) for 1953.
The following bit of code calculates NinYear:

for (i in 1:length(data1$ID)) {
  data1$NinYear[i] <- length(data1[data1$Year==data1$Year[i] &
    data1$ID==data1$ID[i],1]) }

This seems to work but is horribly slow (some files I am working with have
over 500,000 lines). Can anyone suggest a faster way of doing this, perhaps
a way that does not use a for loop? Thanks.

Tom

ID	Year	NinYear
209	1971	0
209	1971	0
213	1951	0
213	1951	0
213	1953	0
213	1953	0
213	1953	0
213	1953	0
213	1953	0
213	1953	0
213	1953	0
213	1953	0
213	1953	0
213	1953	0
213	1953	0
213	1953	0
213	1953	0
213	1953	0
213	1953	0
213	1954	0
213	1954	0
213	1954	0
213	1954	0
213	1954	0
213	1954	0
213	1954	0
213	1954	0
213	1954	0
213	1954	0
213	1954	0
213	1955	0
213	1955	0
234	1953	0
234	1953	0
234	1953	0
234	1953	0
234	1953	0
234	1958	0
234	1958	0
234	1965	0
234	1965	0
234	1965	0
249	1952	0
249	1952	0



-- 
View this message in context:
http://www.nabble.com/Doing-a-Task-Without-Using-a-For-Loop-tp19974078p19974078.html
Sent from the R help mailing list archive at Nabble.com.

Henrique Dallazuanna

2008-Oct-14 14:06 UTC

head link

[R] Doing a Task Without Using a For Loop

Try this:

with(data1, table(ID, Year))

On Tue, Oct 14, 2008 at 10:58 AM, Tom La Bone
<booboo@gforcecable.com>wrote:
>
> Assume that I have the dataframe "data1", which is listed at the
end of
> this
> message. I want count the number of lines that each person has for each
> year. For example, the person with ID=213 has 15 entries (NinYear) for
> 1953.
> The following bit of code calculates NinYear:
>
> for (i in 1:length(data1$ID)) {
>  data1$NinYear[i] <- length(data1[data1$Year==data1$Year[i] &
>    data1$ID==data1$ID[i],1]) }
>
> This seems to work but is horribly slow (some files I am working with have
> over 500,000 lines). Can anyone suggest a faster way of doing this, perhaps
> a way that does not use a for loop? Thanks.
>
> Tom
>
> ID      Year    NinYear
> 209     1971    0
> 209     1971    0
> 213     1951    0
> 213     1951    0
> 213     1953    0
> 213     1953    0
> 213     1953    0
> 213     1953    0
> 213     1953    0
> 213     1953    0
> 213     1953    0
> 213     1953    0
> 213     1953    0
> 213     1953    0
> 213     1953    0
> 213     1953    0
> 213     1953    0
> 213     1953    0
> 213     1953    0
> 213     1954    0
> 213     1954    0
> 213     1954    0
> 213     1954    0
> 213     1954    0
> 213     1954    0
> 213     1954    0
> 213     1954    0
> 213     1954    0
> 213     1954    0
> 213     1954    0
> 213     1955    0
> 213     1955    0
> 234     1953    0
> 234     1953    0
> 234     1953    0
> 234     1953    0
> 234     1953    0
> 234     1958    0
> 234     1958    0
> 234     1965    0
> 234     1965    0
> 234     1965    0
> 249     1952    0
> 249     1952    0
>
>
>
> --
> View this message in context:
>
http://www.nabble.com/Doing-a-Task-Without-Using-a-For-Loop-tp19974078p19974078.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

	[[alternative HTML version deleted]]

Dimitris Rizopoulos

2008-Oct-14 14:06 UTC

head link

[R] Doing a Task Without Using a For Loop

try the following:

out <- tapply(data1$ID, list(data1$ID, data1$Year), length)
out[is.na(out)] <- 0
out


I hope it helps.

Best,
Dimitris


Tom La Bone wrote:> Assume that I have the dataframe "data1", which is listed at the
end of this
> message. I want count the number of lines that each person has for each
> year. For example, the person with ID=213 has 15 entries (NinYear) for
1953.
> The following bit of code calculates NinYear:
> 
> for (i in 1:length(data1$ID)) {
>   data1$NinYear[i] <- length(data1[data1$Year==data1$Year[i] &
>     data1$ID==data1$ID[i],1]) }
> 
> This seems to work but is horribly slow (some files I am working with have
> over 500,000 lines). Can anyone suggest a faster way of doing this, perhaps
> a way that does not use a for loop? Thanks.
> 
> Tom
> 
> ID	Year	NinYear
> 209	1971	0
> 209	1971	0
> 213	1951	0
> 213	1951	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1954	0
> 213	1954	0
> 213	1954	0
> 213	1954	0
> 213	1954	0
> 213	1954	0
> 213	1954	0
> 213	1954	0
> 213	1954	0
> 213	1954	0
> 213	1954	0
> 213	1955	0
> 213	1955	0
> 234	1953	0
> 234	1953	0
> 234	1953	0
> 234	1953	0
> 234	1953	0
> 234	1958	0
> 234	1958	0
> 234	1965	0
> 234	1965	0
> 234	1965	0
> 249	1952	0
> 249	1952	0
> 
> 
> 
-- 
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

Erik Iverson

2008-Oct-14 14:07 UTC

head link

[R] Doing a Task Without Using a For Loop

table(data1$ID, data1$Year)

See ?table and other functions referenced in ?table.



Tom La Bone wrote:> Assume that I have the dataframe "data1", which is listed at the
end of this
> message. I want count the number of lines that each person has for each
> year. For example, the person with ID=213 has 15 entries (NinYear) for
1953.
> The following bit of code calculates NinYear:
> 
> for (i in 1:length(data1$ID)) {
>   data1$NinYear[i] <- length(data1[data1$Year==data1$Year[i] &
>     data1$ID==data1$ID[i],1]) }
> 
> This seems to work but is horribly slow (some files I am working with have
> over 500,000 lines). Can anyone suggest a faster way of doing this, perhaps
> a way that does not use a for loop? Thanks.
> 
> Tom
> 
> ID	Year	NinYear
> 209	1971	0
> 209	1971	0
> 213	1951	0
> 213	1951	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1954	0
> 213	1954	0
> 213	1954	0
> 213	1954	0
> 213	1954	0
> 213	1954	0
> 213	1954	0
> 213	1954	0
> 213	1954	0
> 213	1954	0
> 213	1954	0
> 213	1955	0
> 213	1955	0
> 234	1953	0
> 234	1953	0
> 234	1953	0
> 234	1953	0
> 234	1953	0
> 234	1958	0
> 234	1958	0
> 234	1965	0
> 234	1965	0
> 234	1965	0
> 249	1952	0
> 249	1952	0
> 
> 
>

Claudia Beleites

2008-Oct-14 15:11 UTC

head link

[R] Doing a Task Without Using a For Loop

> This seems to work but is horribly slow (some files I am working with have
> over 500,000 lines). Can anyone suggest a faster way of doing this, perhaps
> a way that does not use a for loop? Thanks.If the table solutions don't work or take forever with your real data, have
a
look into the wiki:
http://wiki.r-project.org/rwiki/doku.php?id=tips:data-frames:count_and_extract_unique_rows

Claudia

-- 
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Universit? degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 (0 40) 5 58-34 47
email: cbeleites at units.it

Tom La Bone

2008-Oct-15 11:33 UTC

head link

[R] Doing a Task Without Using a For Loop

I want to thank everyone for the help. I ended up having to use a loop to
assign values from the table to NinYear. However, as I have played with the
full datasets I have noticed that R is MUCH faster if I use vectors in the
loop rather than columns of a dataframe. In the specific case of 43,000
lines of data, assigning values from the table to the 43,000 elements of a
vector took 6 seconds whereas assigning values from the table to 43,000
elements of a dataframe took 21 minutes. Why is there such a huge
difference?

Tom




Tom La Bone wrote:> 
> Assume that I have the dataframe "data1", which is listed at the
end of
> this message. I want count the number of lines that each person has for
> each year. For example, the person with ID=213 has 15 entries (NinYear)
> for 1953. The following bit of code calculates NinYear:
> 
> for (i in 1:length(data1$ID)) {
>   data1$NinYear[i] <- length(data1[data1$Year==data1$Year[i] &
>     data1$ID==data1$ID[i],1]) }
> 
> This seems to work but is horribly slow (some files I am working with have
> over 500,000 lines). Can anyone suggest a faster way of doing this,
> perhaps a way that does not use a for loop? Thanks.
> 
> Tom
> 
> ID	Year	NinYear
> 209	1971	0
> 209	1971	0
> 213	1951	0
> 213	1951	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1953	0
> 213	1954	0
> 213	1954	0
> 213	1954	0
> 213	1954	0
> 213	1954	0
> 213	1954	0
> 213	1954	0
> 213	1954	0
> 213	1954	0
> 213	1954	0
> 213	1954	0
> 213	1955	0
> 213	1955	0
> 234	1953	0
> 234	1953	0
> 234	1953	0
> 234	1953	0
> 234	1953	0
> 234	1958	0
> 234	1958	0
> 234	1965	0
> 234	1965	0
> 234	1965	0
> 249	1952	0
> 249	1952	0
> 
> 
> 
> 
-- 
View this message in context:
http://www.nabble.com/Doing-a-Task-Without-Using-a-For-Loop-tp19974078p19991682.html
Sent from the R help mailing list archive at Nabble.com.

jim holtman

2008-Oct-15 12:14 UTC

head link

[R] Doing a Task Without Using a For Loop

Run Rprof on your script that is updating the dataframe.  A dataframe
is a list and everytime you access something in the list it can be
expensive.  Rprof will probably show that a lot of time is spent in
the function "[[" which is accessing portions of the dataframe.
Vectors are much faster because they are typically sequentially in
memory and can be accessed easily.  Rprof is always helpful in
answering the question of "why is something taking so long".  It helps
you to find where the potential bottlenecks are.

On Wed, Oct 15, 2008 at 7:33 AM, Tom La Bone <booboo at gforcecable.com>
wrote:>
> I want to thank everyone for the help. I ended up having to use a loop to
> assign values from the table to NinYear. However, as I have played with the
> full datasets I have noticed that R is MUCH faster if I use vectors in the
> loop rather than columns of a dataframe. In the specific case of 43,000
> lines of data, assigning values from the table to the 43,000 elements of a
> vector took 6 seconds whereas assigning values from the table to 43,000
> elements of a dataframe took 21 minutes. Why is there such a huge
> difference?
>
> Tom
>
>
>
>
> Tom La Bone wrote:
>>
>> Assume that I have the dataframe "data1", which is listed at
the end of
>> this message. I want count the number of lines that each person has for
>> each year. For example, the person with ID=213 has 15 entries (NinYear)
>> for 1953. The following bit of code calculates NinYear:
>>
>> for (i in 1:length(data1$ID)) {
>>   data1$NinYear[i] <- length(data1[data1$Year==data1$Year[i] &
>>     data1$ID==data1$ID[i],1]) }
>>
>> This seems to work but is horribly slow (some files I am working with
have
>> over 500,000 lines). Can anyone suggest a faster way of doing this,
>> perhaps a way that does not use a for loop? Thanks.
>>
>> Tom
>>
>> ID    Year    NinYear
>> 209   1971    0
>> 209   1971    0
>> 213   1951    0
>> 213   1951    0
>> 213   1953    0
>> 213   1953    0
>> 213   1953    0
>> 213   1953    0
>> 213   1953    0
>> 213   1953    0
>> 213   1953    0
>> 213   1953    0
>> 213   1953    0
>> 213   1953    0
>> 213   1953    0
>> 213   1953    0
>> 213   1953    0
>> 213   1953    0
>> 213   1953    0
>> 213   1954    0
>> 213   1954    0
>> 213   1954    0
>> 213   1954    0
>> 213   1954    0
>> 213   1954    0
>> 213   1954    0
>> 213   1954    0
>> 213   1954    0
>> 213   1954    0
>> 213   1954    0
>> 213   1955    0
>> 213   1955    0
>> 234   1953    0
>> 234   1953    0
>> 234   1953    0
>> 234   1953    0
>> 234   1953    0
>> 234   1958    0
>> 234   1958    0
>> 234   1965    0
>> 234   1965    0
>> 234   1965    0
>> 249   1952    0
>> 249   1952    0
>>
>>
>>
>>
>
> --
> View this message in context:
http://www.nabble.com/Doing-a-Task-Without-Using-a-For-Loop-tp19974078p19991682.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Oct 2008 - Doing a Task Without Using a For Loop

[R] Doing a Task Without Using a For Loop

[R] Doing a Task Without Using a For Loop

[R] Doing a Task Without Using a For Loop

[R] Doing a Task Without Using a For Loop

[R] Doing a Task Without Using a For Loop

[R] Doing a Task Without Using a For Loop

[R] Doing a Task Without Using a For Loop

Seemingly Similar Threads