I have a data set similar to this: P1id Veg1 Veg2 AreaPoly2 P2ID 1 p p 1 1 1 p p 1.5 2 2 p p 2 3 2 p h 3.5 4 For each group of "Poly1id" records, I wish to output (subset) the record which has largest "AreaPoly2" value, but only if Veg1=Veg2. For this example, the desired dataset would be P1id Veg1 Veg2 AreaPoly2 P2ID 1 p p 1.5 2 2 p p 2 3 Can anyone point me in the right direction on this? Dr. Seth W. Bigelow Biologist, USDA-FS Pacific Southwest Research Station 1731 Research Park Drive, Davis California [[alternative HTML version deleted]]
try this:> x <- read.table(textConnection("P1id Veg1 Veg2 AreaPoly2P2ID + 1 p p 1 1 + 1 p p 1.5 2 + 2 p p 2 3 + 2 p h 3.5 4"), header=TRUE, as.is=TRUE)> # split the dataframe by P1id > x.s <- split(x, x$P1id) > # now go through the sets to see which is the largest > result <- lapply(x.s, function(.sub){+ .match <- subset(.sub, Veg1 == Veg2) + if (length(.match) > 0){ + return(.match[which.max(.match$AreaPoly2),]) + } + else { + return(NULL) + } + })> do.call(rbind, result)P1id Veg1 Veg2 AreaPoly2 P2ID 1 1 p p 1.5 2 2 2 p p 2.0 3>On Mon, Dec 28, 2009 at 8:03 PM, Seth W Bigelow <sbigelow@fs.fed.us> wrote:> I have a data set similar to this: > > P1id Veg1 Veg2 AreaPoly2 P2ID > 1 p p 1 1 > 1 p p 1.5 2 > 2 p p 2 3 > 2 p h 3.5 4 > > For each group of "Poly1id" records, I wish to output (subset) the record > which has largest "AreaPoly2" value, but only if Veg1=Veg2. For this > example, the desired dataset would be > > P1id Veg1 Veg2 AreaPoly2 P2ID > 1 p p 1.5 2 > 2 p p 2 3 > > Can anyone point me in the right direction on this? > > Dr. Seth W. Bigelow > Biologist, USDA-FS Pacific Southwest Research Station > 1731 Research Park Drive, Davis California > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]]
Assuming your data frame is called DF we can use sqldf like this. The
inner select calculates the maximum AreaPoly2 for each group such that
Veg1 = Veg2 and the outer select returns the corresponding row.
library(sqldf)
sqldf("select * from DF a where AreaPoly2 (select max(AreaPoly2) from
DF where Veg1 = Veg2 and P1id = a.P1id)")
Running it looks like this:
> library(sqldf)
> sqldf("select * from DF a where AreaPoly2 + (select
max(AreaPoly2) from DF where Veg1 = Veg2 and P1id = a.P1id)")
P1id Veg1 Veg2 AreaPoly2 P2ID
1 1 p p 1.5 2
2 2 p p 2.0 3
On Mon, Dec 28, 2009 at 8:03 PM, Seth W Bigelow <sbigelow at fs.fed.us>
wrote:> I have a data set similar to this:
>
> P1id ? ?Veg1 ? ?Veg2 ? ?AreaPoly2 ? ? ? P2ID
> 1 ? ? ? p ? ? ? p ? ? ? 1 ? ? ? ? ? ? ? 1
> 1 ? ? ? p ? ? ? p ? ? ? 1.5 ? ? ? ? ? ? 2
> 2 ? ? ? p ? ? ? p ? ? ? 2 ? ? ? ? ? ? ? 3
> 2 ? ? ? p ? ? ? h ? ? ? 3.5 ? ? ? ? ? ? 4
>
> For each group of "Poly1id" records, I wish to output (subset)
the record
> which has largest "AreaPoly2" value, but only if Veg1=Veg2. For
this
> example, the desired dataset would be
>
> P1id ? ?Veg1 ? ?Veg2 ? ?AreaPoly2 ? ? ? P2ID
> 1 ? ? ? p ? ? ? p ? ? ? 1.5 ? ? ? ? ? ? 2
> 2 ? ? ? p ? ? ? p ? ? ? 2 ? ? ? ? ? ? ? 3
>
> Can anyone point me in the right direction on this?
>
> Dr. Seth ?W. Bigelow
> Biologist, USDA-FS Pacific Southwest Research Station
> 1731 Research Park Drive, Davis California
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
On Dec 28, 2009, at 7:03 PM, Seth W Bigelow wrote:> I have a data set similar to this: > > P1id Veg1 Veg2 AreaPoly2 P2ID > 1 p p 1 1 > 1 p p 1.5 2 > 2 p p 2 3 > 2 p h 3.5 4 > > For each group of "Poly1id" records, I wish to output (subset) the > record > which has largest "AreaPoly2" value, but only if Veg1=Veg2. For this > example, the desired dataset would be > > P1id Veg1 Veg2 AreaPoly2 P2ID > 1 p p 1.5 2 > 2 p p 2 3Can you be more expansive (or perhaps more accurate?) about the conditions you want satisfied? Looking at the that dataset, I only see one row that has the largest value for AreaPoly2 within the three records where Veg1==Veg2. Otherwise I would think the answer might be along these lines: > dft <- read.table(textConnection("P1id Veg1 Veg2 AreaPoly2 P2ID + 1 p p 1 1 + 1 p p 1.5 2 + 2 p p 2 3 + 2 p h 3.5 4"), header=T) > dft$Veg1 <- factor(dft$Veg1, levels=levels(dft$Veg2)) > s.dft <- subset(dft, Veg1==Veg2) > s.dft[which.max(s.dft$AreaPoly2),] P1id Veg1 Veg2 AreaPoly2 P2ID 3 2 p p 2 3 -- David> > Can anyone point me in the right direction on this? > > Dr. Seth W. Bigelow > Biologist, USDA-FS Pacific Southwest Research Station > 1731 Research Park Drive, Davis California > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi,
I think you can also use plyr for this,
dft <- read.table(textConnection("P1id Veg1 Veg2 AreaPoly2
P2ID
1 p p 1 1
1 p p 1.5 2
2 p p 2 3
2 p h 3.5 4"), header=T)
library(plyr)
ddply(dft, .(P1id), function(.df) {
.ddf <- subset(.df, as.character(Veg1)==as.character(Veg2))
.ddf[which.max(.ddf$AreaPoly2), ]
})
HTH,
baptiste
2009/12/29 Seth W Bigelow <sbigelow at fs.fed.us>:> I have a data set similar to this:
>
> P1id ? ?Veg1 ? ?Veg2 ? ?AreaPoly2 ? ? ? P2ID
> 1 ? ? ? p ? ? ? p ? ? ? 1 ? ? ? ? ? ? ? 1
> 1 ? ? ? p ? ? ? p ? ? ? 1.5 ? ? ? ? ? ? 2
> 2 ? ? ? p ? ? ? p ? ? ? 2 ? ? ? ? ? ? ? 3
> 2 ? ? ? p ? ? ? h ? ? ? 3.5 ? ? ? ? ? ? 4
>
> For each group of "Poly1id" records, I wish to output (subset)
the record
> which has largest "AreaPoly2" value, but only if Veg1=Veg2. For
this
> example, the desired dataset would be
>
> P1id ? ?Veg1 ? ?Veg2 ? ?AreaPoly2 ? ? ? P2ID
> 1 ? ? ? p ? ? ? p ? ? ? 1.5 ? ? ? ? ? ? 2
> 2 ? ? ? p ? ? ? p ? ? ? 2 ? ? ? ? ? ? ? 3
>
> Can anyone point me in the right direction on this?
>
> Dr. Seth ?W. Bigelow
> Biologist, USDA-FS Pacific Southwest Research Station
> 1731 Research Park Drive, Davis California
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>