on 06/13/2008 11:10 PM T.D.Rudolph wrote:> I have a dataframe, x, with over 60,000 rows that contains one Factor,
"id",
> with 27 levels.
> The dataframe contains numerous continuous values (along column
"diff") per
> day (column "date") for every level of id. I would like to
select only one
> row per animal per day, i.e. that containing the minimum value of
"diff",
> along the full length of 1:nrow(x). I am not yet able to conduct anything
> beyond the simplest of functions and I was hoping someone could suggest an
> effective way of producing this output.
>
> e.g. given this input:
>
> id day diff
> 1 01-01-09 0.5
> 1 01-01-09 0.7
> 2 01-01-09 0.2
> 2 01-01-09 0.4
> 1 01-02-09 0.1
> 1 01-02-09 0.3
> 2 01-02-09 0.3
> 2 01-02-09 0.4
>
> I would like to produce this output:
> id day diff
> 1 01-01-09 0.5
> 2 01-01-09 0.2
> 1 01-02-09 0.1
> 2 01-02-09 0.3
>
> It doesn't seem extremely difficult but I'm sure there are easier
ways than
> how I am currently approaching it!
See ?aggregate
> DF
id day diff
1 1 01-01-09 0.5
2 1 01-01-09 0.7
3 2 01-01-09 0.2
4 2 01-01-09 0.4
5 1 01-02-09 0.1
6 1 01-02-09 0.3
7 2 01-02-09 0.3
8 2 01-02-09 0.4
> aggregate(DF$diff, list(id = DF$id, day = DF$day), min, na.rm = TRUE)
id day x
1 1 01-01-09 0.5
2 2 01-01-09 0.2
3 1 01-02-09 0.1
4 2 01-02-09 0.3
Note that I have not converted the 'day' column to a 'date'
class. You
would need to do that to perform any other date related operations
(including chronological sorting) on that column. See ?as.Date for more
information. For example:
DF$day <- as.Date(DF$day, format = "%m-%d-%y")
HTH,
Marc Schwartz