Claudia Penaloza
2012-Jul-02 22:48 UTC
[R] Removing rows if certain elements are found in character string
I would like to remove rows from the following data frame (df) if there are only two specific elements found in the df$ch character string (I want to remove rows with only "0" & "D" or "0" & "d"). Alternatively, I would like to remove rows if the first non-zero element is "D" or "d". ch count 1 0000000000D0000000000000000000000000000000000000 0.007368; 2 0000000000d0000000000000000000000000000000000000 0.002456; 3 000000000T00000000000000000000000000000000000000 0.007368; 4 000000000TD0000000000000000000000000000000000000 0.007368; 5 000000000T00000000000000000000000000000000000000 0.002456; 6 000000000Td0000000000000000000000000000000000000 0.002456; 7 00000000T000000000000000000000000000000000000000 0.007368; 8 00000000T0D0000000000000000000000000000000000000 0.007368; 9 00000000T000000000000000000000000000000000000000 0.002456; 10 00000000T0d0000000000000000000000000000000000000 0.002456; I tried the following but it doesn't work if there is more than one character per string:>df <- df[!df$ch %in% c("0","D"),] >df <- df[!df$ch %in% c("0","d"),]Any help greatly appreciated, Claudia [[alternative HTML version deleted]]
Rui Barradas
2012-Jul-02 23:24 UTC
[R] Removing rows if certain elements are found in character string
Hello, Try regular expressions instead. In this data.frame, I've changed row nr.4 to have a row with 'D' as first non-zero character. dd <- read.table(text=" ch count 1 0000000000D0000000000000000000000000000000000000 0.007368 2 0000000000d0000000000000000000000000000000000000 0.002456 3 000000000T00000000000000000000000000000000000000 0.007368 4 000000000DT0000000000000000000000000000000000000 0.007368 5 000000000T00000000000000000000000000000000000000 0.002456 6 000000000Td0000000000000000000000000000000000000 0.002456 7 00000000T000000000000000000000000000000000000000 0.007368 8 00000000T0D0000000000000000000000000000000000000 0.007368 9 00000000T000000000000000000000000000000000000000 0.002456 10 00000000T0d0000000000000000000000000000000000000 0.002456 ", header=TRUE) dd i1 <- grepl("^([0D]|[0d])*$", dd$ch) i2 <- grepl("^0*[Dd]", dd$ch) dd[!i1, ] dd[!i2, ] dd[!(i1 | i2), ] Hope this helps, Rui Barradas Em 02-07-2012 23:48, Claudia Penaloza escreveu:> I would like to remove rows from the following data frame (df) if there are > only two specific elements found in the df$ch character string (I want to > remove rows with only "0" & "D" or "0" & "d"). Alternatively, I would like > to remove rows if the first non-zero element is "D" or "d". > > > ch count > 1 0000000000D0000000000000000000000000000000000000 0.007368; > 2 0000000000d0000000000000000000000000000000000000 0.002456; > 3 000000000T00000000000000000000000000000000000000 0.007368; > 4 000000000TD0000000000000000000000000000000000000 0.007368; > 5 000000000T00000000000000000000000000000000000000 0.002456; > 6 000000000Td0000000000000000000000000000000000000 0.002456; > 7 00000000T000000000000000000000000000000000000000 0.007368; > 8 00000000T0D0000000000000000000000000000000000000 0.007368; > 9 00000000T000000000000000000000000000000000000000 0.002456; > 10 00000000T0d0000000000000000000000000000000000000 0.002456; > > > I tried the following but it doesn't work if there is more than one > character per string: > >> df <- df[!df$ch %in% c("0","D"),] >> df <- df[!df$ch %in% c("0","d"),] > > Any help greatly appreciated, > Claudia > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
arun
2012-Jul-02 23:29 UTC
[R] Removing rows if certain elements are found in character string
Hi, Try this: dat1<-read.table(text=" 1? 0000000000D0000000000000000000000000000000000000 0.007368; 2? 0000000000d0000000000000000000000000000000000000 0.002456; 3? 000000000T00000000000000000000000000000000000000 0.007368; 4? 000000000TD0000000000000000000000000000000000000 0.007368; 5? 000000000T00000000000000000000000000000000000000 0.002456; 6? 000000000Td0000000000000000000000000000000000000 0.002456; 7? 00000000T000000000000000000000000000000000000000 0.007368; 8? 00000000T0D0000000000000000000000000000000000000 0.007368; 9? 00000000T000000000000000000000000000000000000000 0.002456; 10 00000000T0d0000000000000000000000000000000000000 0.002456; ",sep="",header=FALSE) colnames(dat1)<-c("num","Ch", "count") #I guess this is what you wanted. ?dat1[grepl("TD|Td|T",dat1$Ch),] ?? num?????????????????????????????????????????????? Ch???? count 3??? 3 000000000T00000000000000000000000000000000000000 0.007368; 4??? 4 000000000TD0000000000000000000000000000000000000 0.007368; 5??? 5 000000000T00000000000000000000000000000000000000 0.002456; 6??? 6 000000000Td0000000000000000000000000000000000000 0.002456; 7??? 7 00000000T000000000000000000000000000000000000000 0.007368; 8??? 8 00000000T0D0000000000000000000000000000000000000 0.007368; 9??? 9 00000000T000000000000000000000000000000000000000 0.002456; 10? 10 00000000T0d0000000000000000000000000000000000000 0.002456; #If you want to remove D or d rows ?dat1[!grepl("D|d",dat1$Ch),] ? num?????????????????????????????????????????????? Ch???? count 3?? 3 000000000T00000000000000000000000000000000000000 0.007368; 5?? 5 000000000T00000000000000000000000000000000000000 0.002456; 7?? 7 00000000T000000000000000000000000000000000000000 0.007368; 9?? 9 00000000T000000000000000000000000000000000000000 0.002456; A.K. ----- Original Message ----- From: Claudia Penaloza <claudiapenaloza at gmail.com> To: r-help at r-project.org Cc: Sent: Monday, July 2, 2012 6:48 PM Subject: [R] Removing rows if certain elements are found in character string I would like to remove rows from the following data frame (df) if there are only two specific elements found in the df$ch character string (I want to remove rows with only "0" & "D" or "0" & "d"). Alternatively, I would like to remove rows if the first non-zero element is "D" or "d". ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ch? ?? count 1? 0000000000D0000000000000000000000000000000000000 0.007368; 2? 0000000000d0000000000000000000000000000000000000 0.002456; 3? 000000000T00000000000000000000000000000000000000 0.007368; 4? 000000000TD0000000000000000000000000000000000000 0.007368; 5? 000000000T00000000000000000000000000000000000000 0.002456; 6? 000000000Td0000000000000000000000000000000000000 0.002456; 7? 00000000T000000000000000000000000000000000000000 0.007368; 8? 00000000T0D0000000000000000000000000000000000000 0.007368; 9? 00000000T000000000000000000000000000000000000000 0.002456; 10 00000000T0d0000000000000000000000000000000000000 0.002456; I tried the following but it doesn't work if there is more than one character per string:>df <- df[!df$ch %in% c("0","D"),] >df <- df[!df$ch %in% c("0","d"),]Any help greatly appreciated, Claudia ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius
2012-Jul-03 02:58 UTC
[R] Removing rows if certain elements are found in character string
On Jul 2, 2012, at 6:48 PM, Claudia Penaloza wrote:> I would like to remove rows from the following data frame (df) if > there are > only two specific elements found in the df$ch character string (I > want to > remove rows with only "0" & "D" or "0" & "d"). Alternatively, I > would like > to remove rows if the first non-zero element is "D" or "d". > > > ch count > 1 0000000000D0000000000000000000000000000000000000 0.007368; > 2 0000000000d0000000000000000000000000000000000000 0.002456; > 3 000000000T00000000000000000000000000000000000000 0.007368; > 4 000000000TD0000000000000000000000000000000000000 0.007368; > 5 000000000T00000000000000000000000000000000000000 0.002456; > 6 000000000Td0000000000000000000000000000000000000 0.002456; > 7 00000000T000000000000000000000000000000000000000 0.007368; > 8 00000000T0D0000000000000000000000000000000000000 0.007368; > 9 00000000T000000000000000000000000000000000000000 0.002456; > 10 00000000T0d0000000000000000000000000000000000000 0.002456; > > > I tried the following but it doesn't work if there is more than one > character per string: > >> df <- df[!df$ch %in% c("0","D"),] >> df <- df[!df$ch %in% c("0","d"),]You seem to be missing test cases for the second set of conditions but this works for the first set (and might for the second): > dat[ grepl("[^0dD]", dat$ch) & ! grepl("^0+d|^0^D", dat$ch) , ] ch count 3 000000000T00000000000000000000000000000000000000 0.007368 4 000000000TD0000000000000000000000000000000000000 0.007368 5 000000000T00000000000000000000000000000000000000 0.002456 6 000000000Td0000000000000000000000000000000000000 0.002456 7 00000000T000000000000000000000000000000000000000 0.007368 8 00000000T0D0000000000000000000000000000000000000 0.007368 9 00000000T000000000000000000000000000000000000000 0.002456 10 00000000T0d0000000000000000000000000000000000000 0.002456>-- David Winsemius, MD West Hartford, CT
MacQueen, Don
2012-Jul-05 18:13 UTC
[R] Removing rows if certain elements are found in character string
Perhaps I've missed something, but if it's really true that the goal is to remove rows if the first non-zero element is "D" or "d", then how about this: tmp <- gsub('0','',df$ch) first <- substr(tmp,1,1) subset(df, tolower(first) != 'd') and of course it could be rolled up into a single expression, but I wrote it in several steps to make it easy to follow. No need to wrap one's brain around regular expressions (which is hard for me!) -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 7/2/12 3:48 PM, "Claudia Penaloza" <claudiapenaloza at gmail.com> wrote:>I would like to remove rows from the following data frame (df) if there >are >only two specific elements found in the df$ch character string (I want to >remove rows with only "0" & "D" or "0" & "d"). Alternatively, I would like >to remove rows if the first non-zero element is "D" or "d". > > > ch count >1 0000000000D0000000000000000000000000000000000000 0.007368; >2 0000000000d0000000000000000000000000000000000000 0.002456; >3 000000000T00000000000000000000000000000000000000 0.007368; >4 000000000TD0000000000000000000000000000000000000 0.007368; >5 000000000T00000000000000000000000000000000000000 0.002456; >6 000000000Td0000000000000000000000000000000000000 0.002456; >7 00000000T000000000000000000000000000000000000000 0.007368; >8 00000000T0D0000000000000000000000000000000000000 0.007368; >9 00000000T000000000000000000000000000000000000000 0.002456; >10 00000000T0d0000000000000000000000000000000000000 0.002456; > > >I tried the following but it doesn't work if there is more than one >character per string: > >>df <- df[!df$ch %in% c("0","D"),] >>df <- df[!df$ch %in% c("0","d"),] > >Any help greatly appreciated, >Claudia > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.