thr3ads.net - R help - [R] Subset data in long format [Jun 2006]

If this information is useful, please help other people find it:
Share via:

Doran, Harold

2006-Jun-06 21:07 UTC

[R] Subset data in long format

I have data in a "long" format where each row is a student and each
student occupies multiple rows with multiple observations. I need to
subset these data based on a condition which I am having difficulty
defining. 

The dataset I am working with is large, but here is a simple data
structure to illustrate the issue

tmp <- data.frame(id = 1:3, matrix(rnorm(30), ncol=10) )
long <- reshape(tmp, idvar='id', varying=list(names(tmp)[2:11]),
v.names=('item'),timevar='position' , direction='long')
long <- long[order(long$id) , ]
long <- long[c(-2,-13),]

What I need to do is subset these data so I have the first 6 rows for
each unique ID. The problem is that the data are unbalanced in that each
ID has a different number of observations (which I why I removed obs 2
and 13).

If the data were balanced, the subset would be trivial and I could just
do

long <- subset(long, position < 7)

However, the data are not balanced. Consequently, if I were to do this
for the unbalanced data I would not have the first 6 obs for the first
ID. I would only have the first 5. Theoretically, what I want for
id1(and for each unique id) is this

ID1 <- subset(long, id==1)
ID1[1:6,]

However, the goal is to subset the entire dataframe at once such that
the subset returns a new dataframe with the first 6 rows for each unique
id. Is there a feasible method for doing this subset that anyone can
suggest? My actual dataset has more than 24,000 unique ids, so I am
hoping to avoid looping through this if possible.

Thanks,
Harold


	[[alternative HTML version deleted]]

Doran, Harold

2006-Jun-06 21:15 UTC

head link

[R] Subset data in long format

Apologies, but there were some word wrap issues in the prior email it
seems. So, here is code for the sample data to avoid confusion 


tmp <- data.frame(id = 1:3, matrix(rnorm(30), ncol=10) )

long <- reshape(tmp, idvar='id', varying=list(names(tmp)[2:11]),
v.names=('item'),timevar='position' , direction='long')

long <- long[order(long$id) , ]

long <- long[c(-2,-13),]
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Doran, Harold
> Sent: Tuesday, June 06, 2006 5:08 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] Subset data in long format
> 
> I have data in a "long" format where each row is a student 
> and each student occupies multiple rows with multiple 
> observations. I need to subset these data based on a 
> condition which I am having difficulty defining. 
> 
> The dataset I am working with is large, but here is a simple 
> data structure to illustrate the issue
> 
> tmp <- data.frame(id = 1:3, matrix(rnorm(30), ncol=10) ) long 
> <- reshape(tmp, idvar='id', varying=list(names(tmp)[2:11]), 
> v.names=('item'),timevar='position' ,
direction='long') long
> <- long[order(long$id) , ] long <- long[c(-2,-13),]
> 
> What I need to do is subset these data so I have the first 6 
> rows for each unique ID. The problem is that the data are 
> unbalanced in that each ID has a different number of 
> observations (which I why I removed obs 2 and 13).
> 
> If the data were balanced, the subset would be trivial and I 
> could just do
> 
> long <- subset(long, position < 7)
> 
> However, the data are not balanced. Consequently, if I were 
> to do this for the unbalanced data I would not have the first 
> 6 obs for the first ID. I would only have the first 5. 
> Theoretically, what I want for id1(and for each unique id) is this
> 
> ID1 <- subset(long, id==1)
> ID1[1:6,]
> 
> However, the goal is to subset the entire dataframe at once 
> such that the subset returns a new dataframe with the first 6 
> rows for each unique id. Is there a feasible method for doing 
> this subset that anyone can suggest? My actual dataset has 
> more than 24,000 unique ids, so I am hoping to avoid looping 
> through this if possible.
> 
> Thanks,
> Harold
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>

Gabor Grothendieck

2006-Jun-06 21:37 UTC

head link

[R] Subset data in long format

Try this:

subset(long, seq(id) - match(id,id) < 6)

On 6/6/06, Doran, Harold <HDoran at air.org>
wrote:> I have data in a "long" format where each row is a student and
each
> student occupies multiple rows with multiple observations. I need to
> subset these data based on a condition which I am having difficulty
> defining.
>
> The dataset I am working with is large, but here is a simple data
> structure to illustrate the issue
>
> tmp <- data.frame(id = 1:3, matrix(rnorm(30), ncol=10) )
> long <- reshape(tmp, idvar='id', varying=list(names(tmp)[2:11]),
> v.names=('item'),timevar='position' ,
direction='long')
> long <- long[order(long$id) , ]
> long <- long[c(-2,-13),]
>
> What I need to do is subset these data so I have the first 6 rows for
> each unique ID. The problem is that the data are unbalanced in that each
> ID has a different number of observations (which I why I removed obs 2
> and 13).
>
> If the data were balanced, the subset would be trivial and I could just
> do
>
> long <- subset(long, position < 7)
>
> However, the data are not balanced. Consequently, if I were to do this
> for the unbalanced data I would not have the first 6 obs for the first
> ID. I would only have the first 5. Theoretically, what I want for
> id1(and for each unique id) is this
>
> ID1 <- subset(long, id==1)
> ID1[1:6,]
>
> However, the goal is to subset the entire dataframe at once such that
> the subset returns a new dataframe with the first 6 rows for each unique
> id. Is there a feasible method for doing this subset that anyone can
> suggest? My actual dataset has more than 24,000 unique ids, so I am
> hoping to avoid looping through this if possible.
>
> Thanks,
> Harold
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>

Maybe Matching Threads

Search for more maybe matching threads

R help - Jun 2006 - Subset data in long format

[R] Subset data in long format

[R] Subset data in long format

[R] Subset data in long format

Maybe Matching Threads