Hi R-helpers, I would like to subset my dataframe, keeping only those rows which satisfy the following conditions: 1) the string "dnv" is found in at least one column; 2) the value in the column previous to the one "dnv" is found in is not "0" Here's what my data look like: ??? POND_ID 2009-05-07 2009-05-15 2009-05-21 2009-05-28 2009-06-04 4 ? ? ? 101 ? ? ? 0.15 ? ? ? ? ?0 ? ? ? ?dnv ? ? ? ?dnv ? ? ? ?dnv 7 ? ? ? 102 ? ? ? ?? 0 ? ? ? ?dnv ? ? ? ?dnv ? ? ? ?dnv ? ? ? ?dnv 87 ? ?? 103 ? ? ? 0.15 ? ? ?? dnv ? ? ? ? ?1 ? ? ? ? ?1 ? ? ? ? ?1 99 ? ?? 104 ? ? ?? dnv ? ? ? 0.25 ? ? ? ?? 1 ? ? ? ? ?1 ? ? ? 0.75 So, for above example, the new dataframe would not contain POND_ID 101 or 102 (because there is a 0 before the dnv) but it WOULD contain POND_ID 103 (because there is a 0.15 before the dnv) and 104 (because dnv occurs in the first column, so cannot be preceded by a 0). One extra twist: I would like to retain rows in the new dataframe which satisfy the above conditions even if they also have a "0" then "dnv" sequence preceding or following the "problem" , e.g., the following rows would be retained in the new dataframe ? ?POND_ID 2009-05-07 2009-05-15 2009-05-21 2009-05-28 2009-06-04 100? ?? 105 ? ? ? 0.15 ? ? ? ?dnv ? ? ? ? ?1 ? ? ? ?? 0 ? ? ?? dnv 101? ?? 106 ? ? ? 0 ?? ? ? ?? dnv ? ? ? ? ?1 ? ? ? ?? 0.15? ?? dnv Thanks in advance for any help you might provide. (I hope I've provided enough of an example; I could also provide a .csv file if that would help.) Mark Na
> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Mark Na > Sent: Tuesday, June 16, 2009 11:27 AM > To: r-help at r-project.org > Subject: [R] How to subset my dataframe? (a bit tricky) > > Hi R-helpers, > > I would like to subset my dataframe, keeping only those rows which > satisfy the following conditions: > > 1) the string "dnv" is found in at least one column; > 2) the value in the column previous to the one "dnv" is found > in is not "0"Suppose your data.frame is called 'd'. Then try looping over its columns: keep <- rep(FALSE, nrow(d)) if (ncol(d)>2) for(i in 3:ncol(d)) keep <- keep | ( d[,i]=="drv" & d[,i-1]!="0") so d[keep,] is the subset you want. Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com> > Here's what my data look like: > > ??? POND_ID 2009-05-07 2009-05-15 2009-05-21 2009-05-28 2009-06-04 > > 4 ? ? ? 101 ? ? ? 0.15 ? ? ? ? ?0 ? ? ? ?dnv ? ? ? ?dnv ? ? ? ?dnv > 7 ? ? ? 102 ? ? ? ?? 0 ? ? ? ?dnv ? ? ? ?dnv ? ? ? ?dnv ? ? ? ?dnv > 87 ? ?? 103 ? ? ? 0.15 ? ? ?? dnv ? ? ? ? ?1 ? ? ? ? ?1 ? ? ? ? ?1 > 99 ? ?? 104 ? ? ?? dnv ? ? ? 0.25 ? ? ? ?? 1 ? ? ? ? ?1 ? ? ? 0.75 > > So, for above example, the new dataframe would not contain POND_ID 101 > or 102 (because there is a 0 before the dnv) but it WOULD contain > POND_ID 103 (because there is a 0.15 before the dnv) and 104 (because > dnv occurs in the first column, so cannot be preceded by a 0). > > One extra twist: I would like to retain rows in the new dataframe > which satisfy the above conditions even if they also have a "0" then > "dnv" sequence preceding or following the "problem" , e.g., the > following rows would be retained in the new dataframe > > ? ?POND_ID 2009-05-07 2009-05-15 2009-05-21 2009-05-28 2009-06-04 > > 100? ?? 105 ? ? ? 0.15 ? ? ? ?dnv ? ? ? ? ?1 ? ? ? ?? 0 ? ? ?? dnv > 101? ?? 106 ? ? ? 0 ?? ? ? ?? dnv ? ? ? ? ?1 ? ? ? ?? 0.15? ?? dnv > > Thanks in advance for any help you might provide. > > (I hope I've provided enough of an example; I could also provide a > .csv file if that would help.) > > Mark Na > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
markleeds at verizon.net
2009-Jun-16 20:24 UTC
[R] How to subset my dataframe? (a bit tricky)
Hi Bill: I was trying to do below myself but was having problems. So I took your solution and made another one. yours was working a little weirdly because I don't think the person wants to keep rows where there are 2 dnv's in a row and he/she also wanted to keep the row if the second column has a "dnv".? So, below is essentially plagiarism with a minor fix. Thanks. DF[unique(unlist(sapply(3:ncol(DF),function(.col) { ? ? ? keeprow <- which(( d[,.col]=="dnv" & d[,.col-1]!="0" & d[,.col-1] ! "dnv") | (d[,2] == "dnv")) }))),] On Jun 16, 2009, William Dunlap <wdunlap at tibco.com> wrote: > -----Original Message----- > From: [1]r-help-bounces at r-project.org > [mailto:[2]r-help-bounces at r-project.org] On Behalf Of Mark Na > Sent: Tuesday, June 16, 2009 11:27 AM > To: [3]r-help at r-project.org > Subject: [R] How to subset my dataframe? (a bit tricky) > > Hi R-helpers, > > I would like to subset my dataframe, keeping only those rows which > satisfy the following conditions: > > 1) the string "dnv" is found in at least one column; > 2) the value in the column previous to the one "dnv" is found > in is not "0" Suppose your data.frame is called 'd'. Then try looping over its columns: keep <- rep(FALSE, nrow(d)) if (ncol(d)>2) for(i in 3:ncol(d)) keep <- keep | ( d[,i]=="drv" & d[,i-1]!="0") so d[keep,] is the subset you want. Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com > > Here's what my data look like: > > ? ? ? POND_ID 2009-05-07 2009-05-15 2009-05-21 2009-05-28 2009-06-04 > > 4 ? ? ? 101 ? ? ? 0.15 ? ? ? ? ? 0 ? ? ? ? dnv ? ? ? ? dnv ? ? ? ? dnv > 7 ? ? ? 102 ? ? ? ? ? 0 ? ? ? ? dnv ? ? ? ? dnv ? ? ? ? dnv ? ? ? ? dnv > 87 ? ? ? 103 ? ? ? 0.15 ? ? ? ? dnv ? ? ? ? ? 1 ? ? ? ? ? 1 ? ? ? ? ? 1 > 99 ? ? ? 104 ? ? ? ? dnv ? ? ? 0.25 ? ? ? ? ? 1 ? ? ? ? ? 1 ? ? ? 0.75 > > So, for above example, the new dataframe would not contain POND_ID 101 > or 102 (because there is a 0 before the dnv) but it WOULD contain > POND_ID 103 (because there is a 0.15 before the dnv) and 104 (because > dnv occurs in the first column, so cannot be preceded by a 0). > > One extra twist: I would like to retain rows in the new dataframe > which satisfy the above conditions even if they also have a "0" then > "dnv" sequence preceding or following the "problem" , e.g., the > following rows would be retained in the new dataframe > > ? ? POND_ID 2009-05-07 2009-05-15 2009-05-21 2009-05-28 2009-06-04 > > 100? ? ? 105 ? ? ? 0.15 ? ? ? ? dnv ? ? ? ? ? 1 ? ? ? ? ? 0 ? ? ? ? dnv > 101? ? ? 106 ? ? ? 0 ? ? ? ? ? ? dnv ? ? ? ? ? 1 ? ? ? ? ? 0.15? ? ? dnv > > Thanks in advance for any help you might provide. > > (I hope I've provided enough of an example; I could also provide a > .csv file if that would help.) > > Mark Na > > ______________________________________________ > [4]R-help at r-project.org mailing list > [5]https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > [6]http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [7]R-help at r-project.org mailing list [8]https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide [9]http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. References 1. mailto:r-help-bounces at r-project.org 2. mailto:r-help-bounces at r-project.org 3. mailto:r-help at r-project.org 4. mailto:R-help at r-project.org 5. https://stat.ethz.ch/mailman/listinfo/r-help 6. http://www.R-project.org/posting-guide.html 7. mailto:R-help at r-project.org 8. https://stat.ethz.ch/mailman/listinfo/r-help 9. http://www.R-project.org/posting-guide.html
I would probably try a different approach than the other suggestions. Paste all the columns other than pond_id together. You now have a character vector. Then keep the rows in which an element of the vector contains "dnv" but does not contain "0 dnv" [use grep()]. This assumes there are no extraneous space characters in any of the values. You'd also for this approach want to watch for columns that have no "dnv" in them, as they might be stored as numeric instead of character. And make sure all your zeros are formatted exactly as "0". The caveat is that I haven't tested this. -Don At 12:26 PM -0600 6/16/09, Mark Na wrote:>Hi R-helpers, > >I would like to subset my dataframe, keeping only those rows which >satisfy the following conditions: > >1) the string "dnv" is found in at least one column; >2) the value in the column previous to the one "dnv" is found in is not "0" > >Here's what my data look like: > > POND_ID 2009-05-07 2009-05-15 2009-05-21 2009-05-28 2009-06-04 > >4 101 0.15 0 dnv dnv dnv >7 102 0 dnv dnv dnv dnv >87 103 0.15 dnv 1 1 1 >99 104 dnv 0.25 1 1 0.75 > >So, for above example, the new dataframe would not contain POND_ID 101 >or 102 (because there is a 0 before the dnv) but it WOULD contain >POND_ID 103 (because there is a 0.15 before the dnv) and 104 (because >dnv occurs in the first column, so cannot be preceded by a 0). > >One extra twist: I would like to retain rows in the new dataframe >which satisfy the above conditions even if they also have a "0" then >"dnv" sequence preceding or following the "problem" , e.g., the >following rows would be retained in the new dataframe > > POND_ID 2009-05-07 2009-05-15 2009-05-21 2009-05-28 2009-06-04 > >100 105 0.15 dnv 1 0 dnv >101 106 0 dnv 1 0.15 dnv > >Thanks in advance for any help you might provide. > >(I hope I've provided enough of an example; I could also provide a >.csv file if that would help.) > >Mark Na > >______________________________________________ >R-help at r-project.org mailing list >https:// stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http:// www. R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- --------------------------------- Don MacQueen Lawrence Livermore National Laboratory Livermore, CA, USA 925-423-1062 macq at llnl.gov
Reasonably Related Threads
- How to extract all rows that contain the value of X in any column?
- How to wrap my (working) code in a loop or function? (loop/function newbie alert)
- Apply pmax to dataframe with different args based on dataframe factor
- How to avoid ifelse statement converting factor to character
- subset dataframe based on rows