thr3ads.net - R help - [R] select duplicate identifier with higher mean across sample columns [Nov 2012]

If this information is useful, please help other people find it:
Share via:

Adrian Johnson

2012-Nov-04 19:25 UTC

[R] select duplicate identifier with higher mean across sample columns

Hi Group:
I searched R groups before posting this question. I could not find the
appropriate answer and I do not have clear understanding how to do
this in R.

I have a data frame with duplicated row identifiers but with different
values across columns. I want to select the identifier with higher
inter-quartile range or mean.


 id <- c("A", "A", "C", "D",
"E", "F")
 year <- c(2000, 2001, 2001, 2002, 2003, 2004)
 samp1 <- c(100, 120, 101, 110, 132,123)
 samp2 <- c(110, 130, 131, 150, 122,143)
 mdf <- data.frame(id,samp1,samp2,samp2a)

> mdf  id samp1 samp2 samp2a
1  A   100   110    110
2  A   120   130    150
3  C   101   131    151
4  D   110   150    130
5  E   132   122    122
6  F   123   143    143


There are two A ids in this df. I want to select the row with higher mean.

How can I do this.
Thanks
Adrian

jim holtman

2012-Nov-04 19:39 UTC

head link

[R] select duplicate identifier with higher mean across sample columns

Is this what you want:
> mdf <- read.table(text = "  id samp1 samp2 samp2a+ 1  A   100   110    110
+ 2  A   120   130    150
+ 3  C   101   131    151
+ 4  D   110   150    130
+ 5  E   132   122    122
+ 6  F   123   143    143", header = TRUE)> result <- do.call(rbind, lapply(split(mdf, mdf$id), function(.id){+     maxIndx <- which.max(rowMeans(.id[, -1L]))
+     .id[maxIndx, ]
+ }))>
> result  id samp1 samp2 samp2a
A  A   120   130    150
C  C   101   131    151
D  D   110   150    130
E  E   132   122    122
F  F   123   143    143


On Sun, Nov 4, 2012 at 2:25 PM, Adrian Johnson
<oriolebaltimore at gmail.com> wrote:> Hi Group:
> I searched R groups before posting this question. I could not find the
> appropriate answer and I do not have clear understanding how to do
> this in R.
>
> I have a data frame with duplicated row identifiers but with different
> values across columns. I want to select the identifier with higher
> inter-quartile range or mean.
>
>
>  id <- c("A", "A", "C", "D",
"E", "F")
>  year <- c(2000, 2001, 2001, 2002, 2003, 2004)
>  samp1 <- c(100, 120, 101, 110, 132,123)
>  samp2 <- c(110, 130, 131, 150, 122,143)
>  mdf <- data.frame(id,samp1,samp2,samp2a)
>
>
>> mdf
>   id samp1 samp2 samp2a
> 1  A   100   110    110
> 2  A   120   130    150
> 3  C   101   131    151
> 4  D   110   150    130
> 5  E   132   122    122
> 6  F   123   143    143
>
>
> There are two A ids in this df. I want to select the row with higher mean.
>
> How can I do this.
> Thanks
> Adrian
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

Rui Barradas

2012-Nov-04 19:40 UTC

head link

[R] select duplicate identifier with higher mean across sample columns

Hello,

Thanks for the data example. (You forgot samp2a).
Try the following.


mdf <- read.table(text="
id samp1 samp2 samp2a
1  A   100   110    110
2  A   120   130    150
3  C   101   131    151
4  D   110   150    130
5  E   132   122    122
6  F   123   143    143
", header=TRUE)

idx <- ave(rowMeans(mdf[,-1]), mdf$id, FUN = function(x) x == max(x))
mdf[as.logical(idx), ]


Hope this helps,

Rui Barradas
Em 04-11-2012 19:25, Adrian Johnson escreveu:> Hi Group:
> I searched R groups before posting this question. I could not find the
> appropriate answer and I do not have clear understanding how to do
> this in R.
>
> I have a data frame with duplicated row identifiers but with different
> values across columns. I want to select the identifier with higher
> inter-quartile range or mean.
>
>
>   id <- c("A", "A", "C", "D",
"E", "F")
>   year <- c(2000, 2001, 2001, 2002, 2003, 2004)
>   samp1 <- c(100, 120, 101, 110, 132,123)
>   samp2 <- c(110, 130, 131, 150, 122,143)
>   mdf <- data.frame(id,samp1,samp2,samp2a)
>
>
>> mdf
>    id samp1 samp2 samp2a
> 1  A   100   110    110
> 2  A   120   130    150
> 3  C   101   131    151
> 4  D   110   150    130
> 5  E   132   122    122
> 6  F   123   143    143
>
>
> There are two A ids in this df. I want to select the row with higher mean.
>
> How can I do this.
> Thanks
> Adrian
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

arun

2012-Nov-04 21:05 UTC

head link

[R] select duplicate identifier with higher mean across sample columns

Hi,
Try this:
mdf[unlist(tapply(rowMeans(mdf[,-1]),mdf$id,FUN=function(x) x%in%max(x))),]
#? id samp1 samp2 samp2a
#2? A?? 120?? 130??? 150
#3? C?? 101?? 131??? 151
#4? D?? 110?? 150??? 130
#5? E?? 132?? 122??? 122
#6? F?? 123?? 143??? 143
A.K.




----- Original Message -----
From: Adrian Johnson <oriolebaltimore at gmail.com>
To: r-help <r-help at r-project.org>
Cc: 
Sent: Sunday, November 4, 2012 2:25 PM
Subject: [R] select duplicate identifier with higher mean across sample columns

Hi Group:
I searched R groups before posting this question. I could not find the
appropriate answer and I do not have clear understanding how to do
this in R.

I have a data frame with duplicated row identifiers but with different
values across columns. I want to select the identifier with higher
inter-quartile range or mean.


id <- c("A", "A", "C", "D",
"E", "F")
year <- c(2000, 2001, 2001, 2002, 2003, 2004)
samp1 <- c(100, 120, 101, 110, 132,123)
samp2 <- c(110, 130, 131, 150, 122,143)
mdf <- data.frame(id,samp1,samp2,samp2a)

> mdf? id samp1 samp2 samp2a
1? A?  100?  110? ? 110
2? A?  120?  130? ? 150
3? C?  101?  131? ? 151
4? D?  110?  150? ? 130
5? E?  132?  122? ? 122
6? F?  123?  143? ? 143


There are two A ids in this df. I want to select the row with higher mean.

How can I do this.
Thanks
Adrian

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Adrian Johnson

2012-Nov-05 15:47 UTC

head link

[R] select duplicate identifier with higher mean across sample columns

Thanks a lot for the help.
-Adrian

On Sun, Nov 4, 2012 at 2:39 PM, jim holtman <jholtman at gmail.com>
wrote:> Is this what you want:
>
>> mdf <- read.table(text = "  id samp1 samp2 samp2a
> + 1  A   100   110    110
> + 2  A   120   130    150
> + 3  C   101   131    151
> + 4  D   110   150    130
> + 5  E   132   122    122
> + 6  F   123   143    143", header = TRUE)
>> result <- do.call(rbind, lapply(split(mdf, mdf$id), function(.id){
> +     maxIndx <- which.max(rowMeans(.id[, -1L]))
> +     .id[maxIndx, ]
> + }))
>>
>> result
>   id samp1 samp2 samp2a
> A  A   120   130    150
> C  C   101   131    151
> D  D   110   150    130
> E  E   132   122    122
> F  F   123   143    143
>
>
> On Sun, Nov 4, 2012 at 2:25 PM, Adrian Johnson
> <oriolebaltimore at gmail.com> wrote:
>> Hi Group:
>> I searched R groups before posting this question. I could not find the
>> appropriate answer and I do not have clear understanding how to do
>> this in R.
>>
>> I have a data frame with duplicated row identifiers but with different
>> values across columns. I want to select the identifier with higher
>> inter-quartile range or mean.
>>
>>
>>  id <- c("A", "A", "C", "D",
"E", "F")
>>  year <- c(2000, 2001, 2001, 2002, 2003, 2004)
>>  samp1 <- c(100, 120, 101, 110, 132,123)
>>  samp2 <- c(110, 130, 131, 150, 122,143)
>>  mdf <- data.frame(id,samp1,samp2,samp2a)
>>
>>
>>> mdf
>>   id samp1 samp2 samp2a
>> 1  A   100   110    110
>> 2  A   120   130    150
>> 3  C   101   131    151
>> 4  D   110   150    130
>> 5  E   132   122    122
>> 6  F   123   143    143
>>
>>
>> There are two A ids in this df. I want to select the row with higher
mean.
>>
>> How can I do this.
>> Thanks
>> Adrian
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.

R help - Nov 2012 - select duplicate identifier with higher mean across sample columns

[R] select duplicate identifier with higher mean across sample columns

[R] select duplicate identifier with higher mean across sample columns

[R] select duplicate identifier with higher mean across sample columns

[R] select duplicate identifier with higher mean across sample columns

[R] select duplicate identifier with higher mean across sample columns