On Sep 13, 2011, at 1:17 PM, bradford wrote:
> With the help of Andrie on StackOverflow.com, I was able to learn
> about
> ddply. I have another question that is more trivial and cannot seem
> to find
> help on IRC and do not want to bother Andrie again.
It's doubtful that he would have considered it a bother. Just post a
question and anyone up for rep points could do it. I certainly haven't
noticed that Andrie is slacking off despite his 14+K points.
> I can't seem to figure
> out what to google for, so I thought I'd ask here.
>
> I have:
> library(plyr)
> df_diff <- ddply(df, .(SOURCE), summarize,
> TIME_DIFF=-unclass(diff(REQUEST_DATE)))
> df_diff
> SOURCE TIME_DIFF
> 1 A 7.55
> 2 A 5.55
> 3 A 3.40
> 4 D 35.00
> 5 D 563.00
> 6 D 37.00
> 7 D 35.00
> 8 D 996.00
>
> ... with a lot more records.
>
> I want to essentially sort SOURCE asc, TIME_DIFF asc and output the
> top 15
> lowest TIME_DIFFS for each SOURCE. How do I do this?
You might (I say "might" in the absence of a reproducible example for
testing) do this with ave:
df_diff[ with( df.diff, ave(TIME_DIFF, SOURCE , FUN= order) < 16), ]
>
> Also, what is the data type of df_diff called so that I can look
> into it
> some more?
The second letter in a **ply call tells you. if it's a "d", then
it
returns a dataframe. First letter is input class, second is output.
>
> [[alternative HTML version deleted]]
>
> _____________________________________________
David Winsemius, MD
West Hartford, CT