sarah bauduin
2011-Jan-31 14:22 UTC
[R] Select rows with distinct values in a column and other conditions
My data frame looks like: SightingID PA1 PA2 PlotID InOverlap Area1 2001 1 -99 392 Y 0.22 2002 1 -99 388 Y 0.253 2008 1 NA 104 N 0.344 2010 1 NA 71 N 0.185 2012 1 NA 61 N 0.166 2013 1 NA 61 N 0.227 2014 1 NA 62 N 0.258 2015 1 NA 63 N 0.199 2016 1 NA 63 N 0.310 2017 1 NA 63 N 0.2511 2018 1 NA 63 N 0.2612 2019 1 NA 63 N 0.2613 2020 1 NA 64 N 0.33 14 2021 1 NA 64 N 0.4215 2022 1 NA 85 N 0.0816 2023 0 1 95 Y 0.11 17 2024 1 NA 93 N 0.2318 2025 1 NA 106 N 0.419 2026 1 NA 134 N 0.28 The only unique values in the data frame are the SightingID. I would like to obtain a new data frame with unique PlotID values based on several conditions:- return the row if there is only one SightingID for the PlotID- if there is several SightingID for the same PlotID value: -select first the SightingID for which PA1=0, if there is several SightingID with PA1=0 for the same PlotID select the one with the highest value in Area, if there is several SightingID with PA1=0 for the same PlotID with the highest value for Area select one SightingID at random - select the SightingID for which PA1 is not equal to 0 based on the highest value in Area (and at random if there are several with the highest value in Area) I have no idea how to do that, can someone help me please ? Sarah [[alternative HTML version deleted]]
Joshua Wiley
2011-Jan-31 16:15 UTC
[R] Select rows with distinct values in a column and other conditions
Dear Sarah, What you will need is a series of logical conditions. ?Logic or ?"|" should pull up the documentation on the logical operators available to use. Because this list does not accept HTML emails (see the posting guide), your data frame did not come through in any coherent form. Can you resend the data using the following procedure: Suppose your data frame is named "dfrm" (but substitute its actual name). At the R console, type: dput(dfrm) this will output how R "sees" the data to the console, then copy and paste that whole jumble of text from R to your email and send it to us. This is one of the easiest way for us to read in small amounts of data, and it should be easy for you to provide too. Cheers, Josh On Mon, Jan 31, 2011 at 6:22 AM, sarah bauduin <sarahbauduin at hotmail.fr> wrote:> > My data frame looks like: > ? SightingID PA1 PA2 PlotID InOverlap Area1 ? ? ? ?2001 ? 1 -99 ? ?392 ? ? ? ? Y ? ? ? ?0.22 ? ? ? ?2002 ? 1 -99 ? ?388 ? ? ? ? Y ? ? ? ?0.253 ? ? ? ?2008 ? 1 ?NA ? ?104 ? ? ? ? N ? ? ? ?0.344 ? ? ? ?2010 ? 1 ?NA ? ? 71 ? ? ? ? N ? ? ? ?0.185 ? ? ? ?2012 ? 1 ?NA ? ? 61 ? ? ? ? N ? ? ? ? 0.166 ? ? ? ?2013 ? 1 ?NA ? ? 61 ? ? ? ? N ? ? ? ? 0.227 ? ? ? ?2014 ? 1 ?NA ? ? 62 ? ? ? ? N ? ? ? ? ?0.258 ? ? ? ?2015 ? 1 ?NA ? ? 63 ? ? ? ? N ? ? ? ? 0.199 ? ? ? ?2016 ? 1 ?NA ? ? 63 ? ? ? ? N ? ? ? ? ?0.310 ? ? ? 2017 ? 1 ?NA ? ? 63 ? ? ? ? N ? ? ? ? 0.2511 ? ? ? 2018 ? 1 ?NA ? ? 63 ? ? ? ? N ? ? ? ?0.2612 ? ? ? 2019 ? 1 ?NA ? ? 63 ? ? ? ? N ? ? ? ? 0.2613 ? ? ? 2020 ? 1 ?NA ? ? 64 ? ? ? ? N ? ? ? ? 0.33 ?14 ? ? ? 2021 ? 1 ?NA ? ? 64 ? ? ? ? N ? ? ? ? 0.4215 ? ? ? 2022 ? 1 ?NA ? ? 85 ? ? ? ? N ? ? ? ? 0.0816 ? ? ? 2023 ? 0 ? 1 ? ? 95 ? ? ? ? Y ? ? ? ? ? 0.11 17 ? ? ? 2024 ? 1 ?NA ? ? 93 ? ? ? ? N ? ? ? ? 0.2318 ? ? ? 2025 ? 1 ?NA ? ?106 ? ? ? ? N ? ? ? ? 0.419 ? ? ? 2026 ? 1 ?NA ? ?134 ! > ? ? ? ? N ? ? ? ?0.28 > The only unique values in the data frame are the SightingID. I would like to obtain a new data frame with unique PlotID values based on several conditions:- return the row if there is only one SightingID for the PlotID- if there is several SightingID for the same PlotID value: ? ? -select first the SightingID for which PA1=0, ? ? ?if there is several SightingID with PA1=0 for the same PlotID select the one with the highest value in Area, ? ? if there is several SightingID with PA1=0 for the same PlotID with the highest value for Area select one SightingID at random ? ? - select the SightingID for which PA1 is not equal to 0 based on the highest value in Area (and at random if there are several with the highest value in Area) > I have no idea how to do that, can someone help me please ? ? ? Sarah > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/
sarah bauduin
2011-Jan-31 16:54 UTC
[R] Select rows with distinct values in a column and other conditions
My dataframe looks like this one: SightingID<-c(2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013)PA1<-c(0,1,0,0,1,1,1,1,0,0,-99,1,1)PA2<-c(1,NA,1,1,NA,-99,-99,NA,1,1,1,NA,NA)PlotID<-c(1,1,2,2,2,3,3,3,4,4,4,4,5)Area<-c(0.2,0.3,0.25,0.2,0.3,0.4,0.3,0.35,0.4,0.4,0.5,0.3,0.2)DF<-cbind(SightingID,PA1,PA2,PlotID,Area) There are several SightingID for a same PlotID value and I need to select only one SightingID for each PlotID value.The SightingID selected for the PlotID value need to be:- the one with PA1=0- if there are several SightingID with PA1=0, select the one with the highest Area value- if there are several SightingID with PA1=0 and the same highest Area value, select one at random- if for one PlotID value there is no SightingID with PA1=0, select the one with the highest Area value (and at random if there are several with the same highest Area value) I would like to have this kind of result: SightingID2<-c(2001,2003,2006,2009,2013)PA12<-c(0,0,1,0,1)PA22<-c(1,1,-99,1,NA)PlotID2<-c(1,2,3,4,5)Area2<-c(0.2,0.25,0.4,0.4,0.2)DF2<-cbind(SightingID2,PA12,PA22,PlotID2,Area2) Can someone help me ?Thanks Sarah [[alternative HTML version deleted]]
Ista Zahn
2011-Jan-31 22:08 UTC
[R] Select rows with distinct values in a column and other conditions
Hi Sarah, Here is how I would do it. Not elegent, but fairly transparent, and it seems to give the desired result. DF <- as.data.frame(DF) pick.value <- function(x){ if(0 %in% x$PA1) { x <- x[x$PA1 == 0,] } x <- x[x$Area == max(x$Area, na.rm=T),] S <- x[sample(1:nrow(x), 1),] return(as.matrix(S)[1, , drop=TRUE]) } DF2 <- matrix(, nrow=length(unique(DF[, "PlotID"])), ncol=ncol(DF), dimnames=list(NULL, names(DF))) for(i in PlotID){ DF2[i,] <- pick.value(DF[DF$PlotID == i,]) } DF2 Best, Ista On Mon, Jan 31, 2011 at 11:54 AM, sarah bauduin <sarahbauduin at hotmail.fr> wrote:> > My dataframe looks like this one: > SightingID<-c(2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013)PA1<-c(0,1,0,0,1,1,1,1,0,0,-99,1,1)PA2<-c(1,NA,1,1,NA,-99,-99,NA,1,1,1,NA,NA)PlotID<-c(1,1,2,2,2,3,3,3,4,4,4,4,5)Area<-c(0.2,0.3,0.25,0.2,0.3,0.4,0.3,0.35,0.4,0.4,0.5,0.3,0.2)DF<-cbind(SightingID,PA1,PA2,PlotID,Area) > There are several SightingID for a same PlotID value and I need to select only one SightingID for each PlotID value.The SightingID selected for the PlotID value need to be:- the one with PA1=0- if there are several SightingID with PA1=0, select the one with the highest Area value- if there are several SightingID with PA1=0 and the same highest Area value, select one at random- if for one PlotID value there is no SightingID with PA1=0, select the one with the highest Area value (and at random if there are several with the same highest Area value) > I would like to have this kind of result: > SightingID2<-c(2001,2003,2006,2009,2013)PA12<-c(0,0,1,0,1)PA22<-c(1,1,-99,1,NA)PlotID2<-c(1,2,3,4,5)Area2<-c(0.2,0.25,0.4,0.4,0.2)DF2<-cbind(SightingID2,PA12,PA22,PlotID2,Area2) > Can someone help me ?Thanks > Sarah > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org
Dear R-help community, I often get this error "missing value where TRUE/FALSE needed" when I'm doing some loops, like with this one for (i in 1:nrow(survey)) { if (survey[i,7]==survey[i,3]) {survey[i,14]<-survey[i,6]} else if(survey[i,11]==0 && survey[i,12]==0) {survey[i,14]<-survey[i,4]} } Can someone explain me what I do wrong because I don't see the difference with other loops that work Thanks a lot Sarah [[alternative HTML version deleted]]
Possibly Parallel Threads
- How to reshape wide format data.frame to long format?
- Search a string and modify it in a .txt file
- Add columns in a dataframe and fill them from another table according to a criteria
- Fill dataframe from a table according to a criteria
- "Warning message: package '...' was built under R version 2.3.0"