thr3ads.net - R help - [R] subsetting by groups, with conditions [Dec 2009]

If this information is useful, please help other people find it:
Share via:

Seth W Bigelow

2009-Dec-29 01:03 UTC

[R] subsetting by groups, with conditions

I have a data set similar to this:

P1id    Veg1    Veg2    AreaPoly2       P2ID
1       p       p       1               1
1       p       p       1.5             2
2       p       p       2               3
2       p       h       3.5             4

For each group of "Poly1id" records, I wish to output (subset) the
record
which has largest "AreaPoly2" value, but only if Veg1=Veg2. For this 
example, the desired dataset would be

P1id    Veg1    Veg2    AreaPoly2       P2ID
1       p       p       1.5             2
2       p       p       2               3
 
Can anyone point me in the right direction on this?

Dr. Seth  W. Bigelow
Biologist, USDA-FS Pacific Southwest Research Station
1731 Research Park Drive, Davis California
	[[alternative HTML version deleted]]

jim holtman

2009-Dec-29 01:25 UTC

head link

[R] subsetting by groups, with conditions

try this:
> x <- read.table(textConnection("P1id    Veg1    Veg2    AreaPoly2P2ID
+ 1       p       p       1               1
+ 1       p       p       1.5             2
+ 2       p       p       2               3
+ 2       p       h       3.5             4"), header=TRUE,
as.is=TRUE)> # split the dataframe by P1id
> x.s <- split(x, x$P1id)
> # now go through the sets to see which is the largest
> result <- lapply(x.s, function(.sub){+     .match <- subset(.sub, Veg1 == Veg2)
+     if (length(.match) > 0){
+         return(.match[which.max(.match$AreaPoly2),])
+     }
+     else {
+         return(NULL)
+     }
+ })> do.call(rbind, result)  P1id Veg1 Veg2 AreaPoly2 P2ID
1    1    p    p       1.5    2
2    2    p    p       2.0    3>

On Mon, Dec 28, 2009 at 8:03 PM, Seth W Bigelow <sbigelow@fs.fed.us>
wrote:
> I have a data set similar to this:
>
> P1id    Veg1    Veg2    AreaPoly2       P2ID
> 1       p       p       1               1
> 1       p       p       1.5             2
> 2       p       p       2               3
> 2       p       h       3.5             4
>
> For each group of "Poly1id" records, I wish to output (subset)
the record
> which has largest "AreaPoly2" value, but only if Veg1=Veg2. For
this
> example, the desired dataset would be
>
> P1id    Veg1    Veg2    AreaPoly2       P2ID
> 1       p       p       1.5             2
> 2       p       p       2               3
>
> Can anyone point me in the right direction on this?
>
> Dr. Seth  W. Bigelow
> Biologist, USDA-FS Pacific Southwest Research Station
> 1731 Research Park Drive, Davis California
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
>
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

	[[alternative HTML version deleted]]

Gabor Grothendieck

2009-Dec-29 01:27 UTC

head link

[R] subsetting by groups, with conditions

Assuming your data frame is called DF we can use sqldf like this.  The
inner select calculates the maximum AreaPoly2 for each group such that
Veg1 = Veg2 and the outer select returns the corresponding row.


library(sqldf)
sqldf("select * from DF a where AreaPoly2       (select max(AreaPoly2) from
DF where Veg1 = Veg2 and P1id = a.P1id)")

Running it looks like this:
> library(sqldf)
> sqldf("select * from DF a where AreaPoly2 +       (select
max(AreaPoly2) from DF where Veg1 = Veg2 and P1id = a.P1id)")  P1id Veg1 Veg2 AreaPoly2 P2ID
1    1    p    p       1.5    2
2    2    p    p       2.0    3


On Mon, Dec 28, 2009 at 8:03 PM, Seth W Bigelow <sbigelow at fs.fed.us>
wrote:> I have a data set similar to this:
>
> P1id ? ?Veg1 ? ?Veg2 ? ?AreaPoly2 ? ? ? P2ID
> 1 ? ? ? p ? ? ? p ? ? ? 1 ? ? ? ? ? ? ? 1
> 1 ? ? ? p ? ? ? p ? ? ? 1.5 ? ? ? ? ? ? 2
> 2 ? ? ? p ? ? ? p ? ? ? 2 ? ? ? ? ? ? ? 3
> 2 ? ? ? p ? ? ? h ? ? ? 3.5 ? ? ? ? ? ? 4
>
> For each group of "Poly1id" records, I wish to output (subset)
the record
> which has largest "AreaPoly2" value, but only if Veg1=Veg2. For
this
> example, the desired dataset would be
>
> P1id ? ?Veg1 ? ?Veg2 ? ?AreaPoly2 ? ? ? P2ID
> 1 ? ? ? p ? ? ? p ? ? ? 1.5 ? ? ? ? ? ? 2
> 2 ? ? ? p ? ? ? p ? ? ? 2 ? ? ? ? ? ? ? 3
>
> Can anyone point me in the right direction on this?
>
> Dr. Seth ?W. Bigelow
> Biologist, USDA-FS Pacific Southwest Research Station
> 1731 Research Park Drive, Davis California
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

David Winsemius

2009-Dec-29 01:31 UTC

head link

[R] subsetting by groups, with conditions

On Dec 28, 2009, at 7:03 PM, Seth W Bigelow wrote:
> I have a data set similar to this:
>
> P1id    Veg1    Veg2    AreaPoly2       P2ID
> 1       p       p       1               1
> 1       p       p       1.5             2
> 2       p       p       2               3
> 2       p       h       3.5             4
>
> For each group of "Poly1id" records, I wish to output (subset)
the
> record
> which has largest "AreaPoly2" value, but only if Veg1=Veg2. For
this
> example, the desired dataset would be
>
> P1id    Veg1    Veg2    AreaPoly2       P2ID
> 1       p       p       1.5             2
> 2       p       p       2               3
Can you be more expansive (or perhaps more accurate?) about the  
conditions you want satisfied? Looking at the that dataset, I only see  
one row that has the largest value for AreaPoly2 within the three  
records where Veg1==Veg2.

Otherwise I would think the answer might be along these lines:
 > dft <- read.table(textConnection("P1id    Veg1    Veg2     
AreaPoly2       P2ID
+ 1       p       p       1               1
+ 1       p       p       1.5             2
+ 2       p       p       2               3
+ 2       p       h       3.5             4"), header=T)
 > dft$Veg1 <- factor(dft$Veg1, levels=levels(dft$Veg2))

 > s.dft <- subset(dft, Veg1==Veg2)

 > s.dft[which.max(s.dft$AreaPoly2),]
   P1id Veg1 Veg2 AreaPoly2 P2ID
3    2    p    p         2    3

-- 
David
>
> Can anyone point me in the right direction on this?
>
> Dr. Seth  W. Bigelow
> Biologist, USDA-FS Pacific Southwest Research Station
> 1731 Research Park Drive, Davis California
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

baptiste auguie

2009-Dec-29 09:04 UTC

head link

[R] subsetting by groups, with conditions

Hi,

I think you can also use plyr for this,

dft <- read.table(textConnection("P1id    Veg1    Veg2    AreaPoly2     
P2ID

 1       p       p       1               1
 1       p       p       1.5             2
 2       p       p       2               3
 2       p       h       3.5             4"), header=T)

library(plyr)

ddply(dft, .(P1id), function(.df) {
  .ddf <- subset(.df, as.character(Veg1)==as.character(Veg2))
  .ddf[which.max(.ddf$AreaPoly2), ]
})

HTH,

baptiste

2009/12/29 Seth W Bigelow <sbigelow at fs.fed.us>:> I have a data set similar to this:
>
> P1id ? ?Veg1 ? ?Veg2 ? ?AreaPoly2 ? ? ? P2ID
> 1 ? ? ? p ? ? ? p ? ? ? 1 ? ? ? ? ? ? ? 1
> 1 ? ? ? p ? ? ? p ? ? ? 1.5 ? ? ? ? ? ? 2
> 2 ? ? ? p ? ? ? p ? ? ? 2 ? ? ? ? ? ? ? 3
> 2 ? ? ? p ? ? ? h ? ? ? 3.5 ? ? ? ? ? ? 4
>
> For each group of "Poly1id" records, I wish to output (subset)
the record
> which has largest "AreaPoly2" value, but only if Veg1=Veg2. For
this
> example, the desired dataset would be
>
> P1id ? ?Veg1 ? ?Veg2 ? ?AreaPoly2 ? ? ? P2ID
> 1 ? ? ? p ? ? ? p ? ? ? 1.5 ? ? ? ? ? ? 2
> 2 ? ? ? p ? ? ? p ? ? ? 2 ? ? ? ? ? ? ? 3
>
> Can anyone point me in the right direction on this?
>
> Dr. Seth ?W. Bigelow
> Biologist, USDA-FS Pacific Southwest Research Station
> 1731 Research Park Drive, Davis California
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Possibly Parallel Threads

Search for more maybe matching threads

R help - Dec 2009 - subsetting by groups, with conditions

[R] subsetting by groups, with conditions

[R] subsetting by groups, with conditions

[R] subsetting by groups, with conditions

[R] subsetting by groups, with conditions

[R] subsetting by groups, with conditions

Possibly Parallel Threads