Hi all,
I'm writing a script to do some basic text analysis in R. Let's assume
I have a data frame named data which contains a column named 'utt'
which contains strings. Is there a straightforward way to achieve
something like this:
data$ContainsThe <-
ifelse(startsWith(data$Utt,"the"),"y","n")
or
data$ContainsThe <-
ifelse(contains(data$Utt,"the"),"y","n")
?
I tried using grep
data$ContainsThe <-
ifelse(grep("the",data$Utt),"y","n")
but this doesn't work becausee grep only returns the rows for which
grep succeeded.
Thanks for any pointers
Claus
On Apr 29, 2010, at 1:17 PM, Claus O'Rourke wrote:> Hi all, > > I'm writing a script to do some basic text analysis in R. Let's assume > I have a data frame named data which contains a column named 'utt' > which contains strings. Is there a straightforward way to achieve > something like this: > > data$ContainsThe <- ifelse(startsWith(data$Utt,"the"),"y","n") > > or > > data$ContainsThe <- ifelse(contains(data$Utt,"the"),"y","n") > ? > > I tried using grep > data$ContainsThe <- ifelse(grep("the",data$Utt),"y","n") > > but this doesn't work> becausee grep only returns the rows for which > grep succeeded.?grepl # which is on the same help page as grep> > Thanks for any pointers > > ClausDavid Winsemius, MD West Hartford, CT
Try with grepl:
data$ContainsThe <-
ifelse(grepl("the",data$Utt),"y","n")
On Thu, Apr 29, 2010 at 2:17 PM, Claus O'Rourke
<claus.orourke@gmail.com>wrote:
> Hi all,
>
> I'm writing a script to do some basic text analysis in R. Let's
assume
> I have a data frame named data which contains a column named 'utt'
> which contains strings. Is there a straightforward way to achieve
> something like this:
>
> data$ContainsThe <-
ifelse(startsWith(data$Utt,"the"),"y","n")
>
> or
>
> data$ContainsThe <-
ifelse(contains(data$Utt,"the"),"y","n")
> ?
>
> I tried using grep
> data$ContainsThe <-
ifelse(grep("the",data$Utt),"y","n")
>
> but this doesn't work becausee grep only returns the rows for which
> grep succeeded.
>
> Thanks for any pointers
>
> Claus
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O
[[alternative HTML version deleted]]