Ptit_Bleu
2007-Dec-17 14:34 UTC
[R] Must be obvious but not to me : problem with regular expression
Hi, I have a vector called nfichiers of 138 names of file whose extension is .P0 or P1 ... to P8. The script is not the same when the extension is P0 or P(1 to 8). Examples of file names : [128] "Output0.P0" [129] "Output0.P1" [130] "Output0.P2" [131] "Output01102007.P0" [132] "Output01102007.P1" [133] "Output01102007.P2" [134] "Output01102007.P3" [135] "Output01102007.P4" To extract the names of file with .P0 extension I wrote : nfichiers[grep(".P0", nfichiers)] For the other extensions : nfichiers[grep(".P[^0]", nfichiers)] But for the last, I get a length of 138 that is the length of the initial vector although I have 130 files with .P0 extension. So I tried "manually" with a small vector :> s[1] "aa.P0" "bb.P0" "cc.P1" "dd.P2"> s[grep(".P[^0]", s)][1] "cc.P1" "dd.P2" It works !!! Has someone an idea to solve this small problem ? Thanks in advance, Ptit Bleu. -- View this message in context: http://www.nabble.com/Must-be-obvious-but-not-to-me-%3A-problem-with-regular-expression-tp14370723p14370723.html Sent from the R help mailing list archive at Nabble.com.
Duncan Murdoch
2007-Dec-17 14:46 UTC
[R] Must be obvious but not to me : problem with regular expression
On 12/17/2007 9:34 AM, Ptit_Bleu wrote:> Hi, > > I have a vector called nfichiers of 138 names of file whose extension is .P0 > or P1 ... to P8. > The script is not the same when the extension is P0 or P(1 to 8). > > Examples of file names : > [128] "Output0.P0" > [129] "Output0.P1" > [130] "Output0.P2" > [131] "Output01102007.P0" > [132] "Output01102007.P1" > [133] "Output01102007.P2" > [134] "Output01102007.P3" > [135] "Output01102007.P4" > > > To extract the names of file with .P0 extension I wrote : > nfichiers[grep(".P0", nfichiers)] > For the other extensions : > nfichiers[grep(".P[^0]", nfichiers)] > > But for the last, I get a length of 138 that is the length of the initial > vector although I have 130 files with .P0 extension.One problem above is that "." is special in regular expressions. I'd also suggest adding $ at the end, to force the match to the end of the string. That is, code as grep("\\.P0$", nfichiers) and grep("\\.P[^0]$", nfichiers) I don't know what false matches you were seeing, but this should eliminate some. Duncan Murdoch> > So I tried "manually" with a small vector : >> s > [1] "aa.P0" "bb.P0" "cc.P1" "dd.P2" >> s[grep(".P[^0]", s)] > [1] "cc.P1" "dd.P2" > > It works !!! > > Has someone an idea to solve this small problem ? > Thanks in advance, > Ptit Bleu. > >
Uwe Ligges
2007-Dec-17 15:01 UTC
[R] Must be obvious but not to me : problem with regular expression
Ptit_Bleu wrote:> Hi, > > I have a vector called nfichiers of 138 names of file whose extension is .P0 > or P1 ... to P8. > The script is not the same when the extension is P0 or P(1 to 8). > > Examples of file names : > [128] "Output0.P0" > [129] "Output0.P1" > [130] "Output0.P2" > [131] "Output01102007.P0" > [132] "Output01102007.P1" > [133] "Output01102007.P2" > [134] "Output01102007.P3" > [135] "Output01102007.P4" > > > To extract the names of file with .P0 extension I wrote : > nfichiers[grep(".P0", nfichiers)] > For the other extensions : > nfichiers[grep(".P[^0]", nfichiers)] > > But for the last, I get a length of 138 that is the length of the initial > vector although I have 130 files with .P0 extension. > > So I tried "manually" with a small vector : >> s > [1] "aa.P0" "bb.P0" "cc.P1" "dd.P2" >> s[grep(".P[^0]", s)] > [1] "cc.P1" "dd.P2"I guess you want grep("\\.P0$", nfichiers) Otherwise you get "XP0X" as a positive as well. And for the others: grep("\\.P[^0]$", nfichiers) with ".P[^0]", you'd get "XPXX" as positive, for example... because you are looking for something that contains a P that is preceded by any character and followed by some non-zero character. Uwe Ligges> It works !!! > > Has someone an idea to solve this small problem ? > Thanks in advance, > Ptit Bleu. > >