Hi, Is there a way to use regular expressions to capture two or more words in a sentence? For example, I wish to to find all the lines that have the words "thomas", "perl", and "program", such as "thomas uses a program called perl", or "perl is a program that thomas uses", etc. I'm sure this is a very easy task, I would greatly appreciate any help. Thanks! Sangick
I'd suggest doing it with multiple regular expressions -- you could construct a single regular expression for this, but I expect it would get quite complicated and possibly very slow. The expression for "y" in the example below tabulates how many words matched for each line (i.e., line 2 matched 1 word, line 3 matched 3 words, and line 4 matched 2 words). > x <- readLines("clipboard", -1) > x [1] "Is there a way to use regular expressions to capture two or more words in a " [2] "sentence? For example, I wish to to find all the lines that have the words \"thomas\", " [3] "\"perl\", and \"program\", such as \"thomas uses a program called perl\", or \"perl is a " [4] "program that thomas uses\", etc." > sapply(c("perl","program","thomas"), function(re) grep(re, x)) $perl [1] 3 $program [1] 3 4 $thomas [1] 2 3 4 > unlist(sapply(c("perl","program","thomas"), function(re) grep(re, x)), use.names=F) [1] 3 3 4 2 3 4 > y <- table(unlist(sapply(c("perl","program","thomas"), function(re) grep(re, x)), use.names=F)) > y 2 3 4 1 3 2 > which(y>=2) 3 4 2 3 > hope this helps, Tony Plate At Monday 05:59 PM 7/12/2004, Sangick Jeon wrote:>Hi, > >Is there a way to use regular expressions to capture two or more words in a >sentence? For example, I wish to to find all the lines that have the >words "thomas", >"perl", and "program", such as "thomas uses a program called perl", or >"perl is a >program that thomas uses", etc. > >I'm sure this is a very easy task, I would greatly appreciate any >help. Thanks! > >Sangick > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://www.stat.math.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Sangick Jeon <sijeon <at> ucdavis.edu> writes:> Is there a way to use regular expressions to capture two or more words in a > sentence? For example, I wish to to find all the lines that have thewords "thomas",> "perl", and "program", such as "thomas uses a program called perl", or "perlis a> program that thomas uses", etc.If you only have two patterns to search for then a regular expression can be done this way: data(state) grep("i.*n|n.*i", state.name) # states with i and n in name but it gets unwieldy if you have three since there are 6 permutations, not 2. In that case, you are probably better off iterating greps like this: lookfor <- function(pat, x) { for(p in pat) x <- grep(p, x, value = TRUE); x } lookfor(c("i","n","g"), state.name) # states with i, n and g in name
If you use the RPerl interface (http://www.omegahat.org/RSPerl/), you can do this simply for any number of expressions by letting Perl do the matching. If "strings" is the array of strings that you want to test, all you need is for (@strings) { if (/thomas/ && /perl/ && /prog/) {do something} } But I don't know a simple way to do it in R. --John ----- Original Message ----- From: "Sangick Jeon" <sijeon at ucdavis.edu> To: <r-help at stat.math.ethz.ch> Sent: Monday, July 12, 2004 4:59 PM Subject: [R] Regular Expressions> > > Hi, > > Is there a way to use regular expressions to capturetwo or more words in a> sentence? For example, I wish to to find all the linesthat have the words "thomas",> "perl", and "program", such as "thomas uses a programcalled perl", or "perl is a> program that thomas uses", etc. > > I'm sure this is a very easy task, I would greatlyappreciate any help. Thanks!> > Sangick > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide!http://www.R-project.org/posting-guide.html
Hello, Not really regular expressions but you may also look at the first version of my package ttda at http://wwwpeople.unil.ch/jean-pierre.mueller/ and the functions: ttda.get.text ttda.segmentation ttda.forms.frame ttda.TLE HTH. -- Jean-Pierre M??ller SSP / BFSH2 / UNIL / CH - 1015 Lausanne Voice:+41 21 692 3116 / Fax:+41 21 692 3115 Please avoid sending me Word or PowerPoint attachments. See http://www.fsf.org/philosophy/no-word-attachments.html S'il vous pla??t, ??vitez de m'envoyer des attachements au format Word ou PowerPoint. Voir http://www.fsf.org/philosophy/no-word-attachments.fr.html
Is there something wrong with this URL? Mark Palmer Environmetrics Monitoring for Management CSIRO Mathematical and Information Sciences Private bag 5, Wembley, Western Australia, 6913 Phone 61-8-9333-6293 Mobile 0427-50-2353 Fax: 61-8-9333-6121 Email: Mark.Palmer at csiro.au URL: www.cmis.csiro.au/envir -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch]On Behalf Of Jean-Pierre M??ller Sent: Tuesday, 13 July 2004 3:29 PM To: Sangick Jeon Cc: r-help at stat.math.ethz.ch Subject: Re: [R] Regular Expressions Hello, Not really regular expressions but you may also look at the first version of my package ttda at http://wwwpeople.unil.ch/jean-pierre.mueller/ and the functions: ttda.get.text ttda.segmentation ttda.forms.frame ttda.TLE HTH. -- Jean-Pierre M??ller SSP / BFSH2 / UNIL / CH - 1015 Lausanne Voice:+41 21 692 3116 / Fax:+41 21 692 3115 Please avoid sending me Word or PowerPoint attachments. See http://www.fsf.org/philosophy/no-word-attachments.html S'il vous pla??t, ??vitez de m'envoyer des attachements au format Word ou PowerPoint. Voir http://www.fsf.org/philosophy/no-word-attachments.fr.html ______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html