Hi all, I'm writing a script to do some basic text analysis in R. Let's assume I have a data frame named data which contains a column named 'utt' which contains strings. Is there a straightforward way to achieve something like this: data$ContainsThe <- ifelse(startsWith(data$Utt,"the"),"y","n") or data$ContainsThe <- ifelse(contains(data$Utt,"the"),"y","n") ? I tried using grep data$ContainsThe <- ifelse(grep("the",data$Utt),"y","n") but this doesn't work becausee grep only returns the rows for which grep succeeded. Thanks for any pointers Claus
On Apr 29, 2010, at 1:17 PM, Claus O'Rourke wrote:> Hi all, > > I'm writing a script to do some basic text analysis in R. Let's assume > I have a data frame named data which contains a column named 'utt' > which contains strings. Is there a straightforward way to achieve > something like this: > > data$ContainsThe <- ifelse(startsWith(data$Utt,"the"),"y","n") > > or > > data$ContainsThe <- ifelse(contains(data$Utt,"the"),"y","n") > ? > > I tried using grep > data$ContainsThe <- ifelse(grep("the",data$Utt),"y","n") > > but this doesn't work> becausee grep only returns the rows for which > grep succeeded.?grepl # which is on the same help page as grep> > Thanks for any pointers > > ClausDavid Winsemius, MD West Hartford, CT
Try with grepl: data$ContainsThe <- ifelse(grepl("the",data$Utt),"y","n") On Thu, Apr 29, 2010 at 2:17 PM, Claus O'Rourke <claus.orourke@gmail.com>wrote:> Hi all, > > I'm writing a script to do some basic text analysis in R. Let's assume > I have a data frame named data which contains a column named 'utt' > which contains strings. Is there a straightforward way to achieve > something like this: > > data$ContainsThe <- ifelse(startsWith(data$Utt,"the"),"y","n") > > or > > data$ContainsThe <- ifelse(contains(data$Utt,"the"),"y","n") > ? > > I tried using grep > data$ContainsThe <- ifelse(grep("the",data$Utt),"y","n") > > but this doesn't work becausee grep only returns the rows for which > grep succeeded. > > Thanks for any pointers > > Claus > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]