thr3ads.net - R help - [R] Subset by Factor by date [Jun 2008]

If this information is useful, please help other people find it:
Share via:

T.D.Rudolph

2008-Jun-14 04:10 UTC

[R] Subset by Factor by date

I have a dataframe, x, with over 60,000 rows that contains one Factor,
"id",
with 27 levels.  
The dataframe contains numerous continuous values (along column
"diff") per
day (column "date") for every level of id.  I would like to select
only one
row per animal per day, i.e. that containing the minimum value of
"diff",
along the full length of 1:nrow(x).  I am not yet able to conduct anything
beyond the simplest of functions and I was hoping someone could suggest an
effective way of producing this output.

e.g. given this input:

id  day         diff
1  01-01-09  0.5
1  01-01-09  0.7
2  01-01-09  0.2
2  01-01-09  0.4
1  01-02-09  0.1
1  01-02-09  0.3
2  01-02-09  0.3
2  01-02-09  0.4

I would like to produce this output:
id day          diff
1  01-01-09  0.5
2  01-01-09  0.2
1  01-02-09  0.1
2  01-02-09  0.3

It doesn't seem extremely difficult but I'm sure there are easier ways
than
how I am currently approaching it!
-- 
View this message in context:
http://www.nabble.com/Subset-by-Factor-by-date-tp17835631p17835631.html
Sent from the R help mailing list archive at Nabble.com.

Marc Schwartz

2008-Jun-14 04:24 UTC

head link

[R] Subset by Factor by date

on 06/13/2008 11:10 PM T.D.Rudolph wrote:> I have a dataframe, x, with over 60,000 rows that contains one Factor,
"id",
> with 27 levels.  
> The dataframe contains numerous continuous values (along column
"diff") per
> day (column "date") for every level of id.  I would like to
select only one
> row per animal per day, i.e. that containing the minimum value of
"diff",
> along the full length of 1:nrow(x).  I am not yet able to conduct anything
> beyond the simplest of functions and I was hoping someone could suggest an
> effective way of producing this output.
> 
> e.g. given this input:
> 
> id  day         diff
> 1  01-01-09  0.5
> 1  01-01-09  0.7
> 2  01-01-09  0.2
> 2  01-01-09  0.4
> 1  01-02-09  0.1
> 1  01-02-09  0.3
> 2  01-02-09  0.3
> 2  01-02-09  0.4
> 
> I would like to produce this output:
> id day          diff
> 1  01-01-09  0.5
> 2  01-01-09  0.2
> 1  01-02-09  0.1
> 2  01-02-09  0.3
> 
> It doesn't seem extremely difficult but I'm sure there are easier
ways than
> how I am currently approaching it!
See ?aggregate

 > DF
   id      day diff
1  1 01-01-09  0.5
2  1 01-01-09  0.7
3  2 01-01-09  0.2
4  2 01-01-09  0.4
5  1 01-02-09  0.1
6  1 01-02-09  0.3
7  2 01-02-09  0.3
8  2 01-02-09  0.4


 > aggregate(DF$diff, list(id = DF$id, day = DF$day), min, na.rm = TRUE)
   id      day   x
1  1 01-01-09 0.5
2  2 01-01-09 0.2
3  1 01-02-09 0.1
4  2 01-02-09 0.3


Note that I have not converted the 'day' column to a 'date'
class. You
would need to do that to perform any other date related operations 
(including chronological sorting) on that column. See ?as.Date for more 
information. For example:

   DF$day <- as.Date(DF$day, format = "%m-%d-%y")


HTH,

Marc Schwartz

Maybe Matching Threads

Search for more seemingly similar threads

R help - Jun 2008 - Subset by Factor by date

[R] Subset by Factor by date

[R] Subset by Factor by date

Maybe Matching Threads