On Mar 10, 2010, at 10:30 AM, arnaud chozo wrote:
> Hi,
>
> I've a beginner question. I'm trying to extract data in my
dataframe
> according to some nested rules.
>
> I have something like the dataframe test.df:
>
> test.df = data.frame(V1=c(rep("A",10), rep("B",10),
rep("C",5)),
> V2=c(rep(1,5), rep(2,5), rep(1,5), rep(2,5), rep(1,5)))
>
> V1 V2
> 1 A 1
> 2 A 1
> 3 A 1
> 4 A 1
> 5 A 1
> 6 A 2
> 7 A 2
> 8 A 2
> 9 A 2
> 10 A 2
> 11 B 1
> 12 B 1
> 13 B 1
> 14 B 1
> 15 B 1
> 16 B 2
> 17 B 2
> 18 B 2
> 19 B 2
> 20 B 2
> 21 C 1
> 22 C 1
> 23 C 1
> 24 C 1
> 25 C 1
>
> For each value of the variable V1 (group A, B or C), I want to
> extract rows
> for which V2 is the max for the group in V1, in order to get:
>
> V1 V2
> 1 A 2
> 2 A 2
> 3 A 2
> 4 A 2
> 5 A 2
> 6 B 2
> 7 B 2
> 8 B 2
> 9 B 2
> 10 B 2
> 11 C 1
> 12 C 1
> 13 C 1
> 14 C 1
> 15 C 1
>
> test.df[test.df$V2 == ave(test.df$V2, test.df$V1, FUN=max), ]
V1 V2
6 A 2
7 A 2
8 A 2
9 A 2
10 A 2
16 B 2
17 B 2
18 B 2
19 B 2
20 B 2
21 C 1
22 C 1
23 C 1
24 C 1
25 C 1
You get a bit of extra information in the form of the row numbers
which were extracted. If you want to get rid of that information, it
would not be difficult.
--
David.> I wrote this function:
>
> mytest = function(df) {
> myS = unique(df$V1)
> df.tmp = subset(df, df$V1==myS[[1]])
> df.sub = subset(df.tmp, df.tmp$V2==max(df.tmp$V2))
> for (i in 2:length(myS)) {
> df.tmp = subset(df, df$V1==myS[[i]])
> df.sub = merge(df.sub, subset(df.tmp, df.tmp$V2==max(df.tmp$V2)),
> all=TRUE)
> }
> df.sub
> }
>
> but need some more efficient and more general. Any idea?
>
> Thanks in advance,
> Arnaud
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT