Claudia Penaloza
2012-Jul-02 22:48 UTC
[R] Removing rows if certain elements are found in character string
I would like to remove rows from the following data frame (df) if there are
only two specific elements found in the df$ch character string (I want to
remove rows with only "0" & "D" or "0" &
"d"). Alternatively, I would like
to remove rows if the first non-zero element is "D" or "d".
ch count
1 0000000000D0000000000000000000000000000000000000 0.007368;
2 0000000000d0000000000000000000000000000000000000 0.002456;
3 000000000T00000000000000000000000000000000000000 0.007368;
4 000000000TD0000000000000000000000000000000000000 0.007368;
5 000000000T00000000000000000000000000000000000000 0.002456;
6 000000000Td0000000000000000000000000000000000000 0.002456;
7 00000000T000000000000000000000000000000000000000 0.007368;
8 00000000T0D0000000000000000000000000000000000000 0.007368;
9 00000000T000000000000000000000000000000000000000 0.002456;
10 00000000T0d0000000000000000000000000000000000000 0.002456;
I tried the following but it doesn't work if there is more than one
character per string:
>df <- df[!df$ch %in% c("0","D"),]
>df <- df[!df$ch %in% c("0","d"),]
Any help greatly appreciated,
Claudia
[[alternative HTML version deleted]]
Rui Barradas
2012-Jul-02 23:24 UTC
[R] Removing rows if certain elements are found in character string
Hello,
Try regular expressions instead.
In this data.frame, I've changed row nr.4 to have a row with 'D' as
first non-zero character.
dd <- read.table(text="
ch count
1 0000000000D0000000000000000000000000000000000000 0.007368
2 0000000000d0000000000000000000000000000000000000 0.002456
3 000000000T00000000000000000000000000000000000000 0.007368
4 000000000DT0000000000000000000000000000000000000 0.007368
5 000000000T00000000000000000000000000000000000000 0.002456
6 000000000Td0000000000000000000000000000000000000 0.002456
7 00000000T000000000000000000000000000000000000000 0.007368
8 00000000T0D0000000000000000000000000000000000000 0.007368
9 00000000T000000000000000000000000000000000000000 0.002456
10 00000000T0d0000000000000000000000000000000000000 0.002456
", header=TRUE)
dd
i1 <- grepl("^([0D]|[0d])*$", dd$ch)
i2 <- grepl("^0*[Dd]", dd$ch)
dd[!i1, ]
dd[!i2, ]
dd[!(i1 | i2), ]
Hope this helps,
Rui Barradas
Em 02-07-2012 23:48, Claudia Penaloza escreveu:> I would like to remove rows from the following data frame (df) if there are
> only two specific elements found in the df$ch character string (I want to
> remove rows with only "0" & "D" or "0"
& "d"). Alternatively, I would like
> to remove rows if the first non-zero element is "D" or
"d".
>
>
> ch count
> 1 0000000000D0000000000000000000000000000000000000 0.007368;
> 2 0000000000d0000000000000000000000000000000000000 0.002456;
> 3 000000000T00000000000000000000000000000000000000 0.007368;
> 4 000000000TD0000000000000000000000000000000000000 0.007368;
> 5 000000000T00000000000000000000000000000000000000 0.002456;
> 6 000000000Td0000000000000000000000000000000000000 0.002456;
> 7 00000000T000000000000000000000000000000000000000 0.007368;
> 8 00000000T0D0000000000000000000000000000000000000 0.007368;
> 9 00000000T000000000000000000000000000000000000000 0.002456;
> 10 00000000T0d0000000000000000000000000000000000000 0.002456;
>
>
> I tried the following but it doesn't work if there is more than one
> character per string:
>
>> df <- df[!df$ch %in% c("0","D"),]
>> df <- df[!df$ch %in% c("0","d"),]
>
> Any help greatly appreciated,
> Claudia
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
arun
2012-Jul-02 23:29 UTC
[R] Removing rows if certain elements are found in character string
Hi,
Try this:
dat1<-read.table(text="
1? 0000000000D0000000000000000000000000000000000000 0.007368;
2? 0000000000d0000000000000000000000000000000000000 0.002456;
3? 000000000T00000000000000000000000000000000000000 0.007368;
4? 000000000TD0000000000000000000000000000000000000 0.007368;
5? 000000000T00000000000000000000000000000000000000 0.002456;
6? 000000000Td0000000000000000000000000000000000000 0.002456;
7? 00000000T000000000000000000000000000000000000000 0.007368;
8? 00000000T0D0000000000000000000000000000000000000 0.007368;
9? 00000000T000000000000000000000000000000000000000 0.002456;
10 00000000T0d0000000000000000000000000000000000000 0.002456;
",sep="",header=FALSE)
colnames(dat1)<-c("num","Ch", "count")
#I guess this is what you wanted.
?dat1[grepl("TD|Td|T",dat1$Ch),]
?? num?????????????????????????????????????????????? Ch???? count
3??? 3 000000000T00000000000000000000000000000000000000 0.007368;
4??? 4 000000000TD0000000000000000000000000000000000000 0.007368;
5??? 5 000000000T00000000000000000000000000000000000000 0.002456;
6??? 6 000000000Td0000000000000000000000000000000000000 0.002456;
7??? 7 00000000T000000000000000000000000000000000000000 0.007368;
8??? 8 00000000T0D0000000000000000000000000000000000000 0.007368;
9??? 9 00000000T000000000000000000000000000000000000000 0.002456;
10? 10 00000000T0d0000000000000000000000000000000000000 0.002456;
#If you want to remove D or d rows
?dat1[!grepl("D|d",dat1$Ch),]
? num?????????????????????????????????????????????? Ch???? count
3?? 3 000000000T00000000000000000000000000000000000000 0.007368;
5?? 5 000000000T00000000000000000000000000000000000000 0.002456;
7?? 7 00000000T000000000000000000000000000000000000000 0.007368;
9?? 9 00000000T000000000000000000000000000000000000000 0.002456;
A.K.
----- Original Message -----
From: Claudia Penaloza <claudiapenaloza at gmail.com>
To: r-help at r-project.org
Cc:
Sent: Monday, July 2, 2012 6:48 PM
Subject: [R] Removing rows if certain elements are found in character string
I would like to remove rows from the following data frame (df) if there are
only two specific elements found in the df$ch character string (I want to
remove rows with only "0" & "D" or "0" &
"d"). Alternatively, I would like
to remove rows if the first non-zero element is "D" or "d".
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ch? ?? count
1? 0000000000D0000000000000000000000000000000000000 0.007368;
2? 0000000000d0000000000000000000000000000000000000 0.002456;
3? 000000000T00000000000000000000000000000000000000 0.007368;
4? 000000000TD0000000000000000000000000000000000000 0.007368;
5? 000000000T00000000000000000000000000000000000000 0.002456;
6? 000000000Td0000000000000000000000000000000000000 0.002456;
7? 00000000T000000000000000000000000000000000000000 0.007368;
8? 00000000T0D0000000000000000000000000000000000000 0.007368;
9? 00000000T000000000000000000000000000000000000000 0.002456;
10 00000000T0d0000000000000000000000000000000000000 0.002456;
I tried the following but it doesn't work if there is more than one
character per string:
>df <- df[!df$ch %in% c("0","D"),]
>df <- df[!df$ch %in% c("0","d"),]
Any help greatly appreciated,
Claudia
??? [[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius
2012-Jul-03 02:58 UTC
[R] Removing rows if certain elements are found in character string
On Jul 2, 2012, at 6:48 PM, Claudia Penaloza wrote:> I would like to remove rows from the following data frame (df) if > there are > only two specific elements found in the df$ch character string (I > want to > remove rows with only "0" & "D" or "0" & "d"). Alternatively, I > would like > to remove rows if the first non-zero element is "D" or "d". > > > ch count > 1 0000000000D0000000000000000000000000000000000000 0.007368; > 2 0000000000d0000000000000000000000000000000000000 0.002456; > 3 000000000T00000000000000000000000000000000000000 0.007368; > 4 000000000TD0000000000000000000000000000000000000 0.007368; > 5 000000000T00000000000000000000000000000000000000 0.002456; > 6 000000000Td0000000000000000000000000000000000000 0.002456; > 7 00000000T000000000000000000000000000000000000000 0.007368; > 8 00000000T0D0000000000000000000000000000000000000 0.007368; > 9 00000000T000000000000000000000000000000000000000 0.002456; > 10 00000000T0d0000000000000000000000000000000000000 0.002456; > > > I tried the following but it doesn't work if there is more than one > character per string: > >> df <- df[!df$ch %in% c("0","D"),] >> df <- df[!df$ch %in% c("0","d"),]You seem to be missing test cases for the second set of conditions but this works for the first set (and might for the second): > dat[ grepl("[^0dD]", dat$ch) & ! grepl("^0+d|^0^D", dat$ch) , ] ch count 3 000000000T00000000000000000000000000000000000000 0.007368 4 000000000TD0000000000000000000000000000000000000 0.007368 5 000000000T00000000000000000000000000000000000000 0.002456 6 000000000Td0000000000000000000000000000000000000 0.002456 7 00000000T000000000000000000000000000000000000000 0.007368 8 00000000T0D0000000000000000000000000000000000000 0.007368 9 00000000T000000000000000000000000000000000000000 0.002456 10 00000000T0d0000000000000000000000000000000000000 0.002456>-- David Winsemius, MD West Hartford, CT
MacQueen, Don
2012-Jul-05 18:13 UTC
[R] Removing rows if certain elements are found in character string
Perhaps I've missed something, but if it's really true that the goal is
to
remove rows if the first non-zero element is "D" or "d",
then how about
this:
tmp <- gsub('0','',df$ch)
first <- substr(tmp,1,1)
subset(df, tolower(first) != 'd')
and of course it could be rolled up into a single expression, but I wrote
it in several steps to make it easy to follow. No need to wrap one's brain
around regular expressions (which is hard for me!)
-Don
--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
On 7/2/12 3:48 PM, "Claudia Penaloza" <claudiapenaloza at
gmail.com> wrote:
>I would like to remove rows from the following data frame (df) if there
>are
>only two specific elements found in the df$ch character string (I want to
>remove rows with only "0" & "D" or "0"
& "d"). Alternatively, I would like
>to remove rows if the first non-zero element is "D" or
"d".
>
>
> ch count
>1 0000000000D0000000000000000000000000000000000000 0.007368;
>2 0000000000d0000000000000000000000000000000000000 0.002456;
>3 000000000T00000000000000000000000000000000000000 0.007368;
>4 000000000TD0000000000000000000000000000000000000 0.007368;
>5 000000000T00000000000000000000000000000000000000 0.002456;
>6 000000000Td0000000000000000000000000000000000000 0.002456;
>7 00000000T000000000000000000000000000000000000000 0.007368;
>8 00000000T0D0000000000000000000000000000000000000 0.007368;
>9 00000000T000000000000000000000000000000000000000 0.002456;
>10 00000000T0d0000000000000000000000000000000000000 0.002456;
>
>
>I tried the following but it doesn't work if there is more than one
>character per string:
>
>>df <- df[!df$ch %in% c("0","D"),]
>>df <- df[!df$ch %in% c("0","d"),]
>
>Any help greatly appreciated,
>Claudia
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.