Dear r-help members, I have a number in the form of a string, say: a<-"-01020.909200" I'd like to extract "1020." as well as ".9092" Front<-grep(pattern="[1-9]+[0-9]*\\.", value=TRUE, x=a, fixed=FALSE) End<-grep(pattern="\\.[0-9]*[1-9]+", value=TRUE, x=a, fixed=FALSE) However, both strings give "-01020.909200", exactly a. Could you please point me to what is wrong? Thanks and best regards H. van Lishaut
grep() returns the matches. You want regexpr() and regmatches() -- Bert On Tue, Aug 21, 2012 at 12:24 PM, Dr. Holger van Lishaut <H.v.Lishaut at gmx.de> wrote:> Dear r-help members, > > I have a number in the form of a string, say: > > a<-"-01020.909200" > > I'd like to extract "1020." as well as ".9092" > > Front<-grep(pattern="[1-9]+[0-9]*\\.", value=TRUE, x=a, fixed=FALSE) > End<-grep(pattern="\\.[0-9]*[1-9]+", value=TRUE, x=a, fixed=FALSE) > > However, both strings give "-01020.909200", exactly a. > Could you please point me to what is wrong? > > Thanks and best regards > H. van Lishaut > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
'grep' does not change strings. Use 'gsub' or 'regmatches': # gsub Front <- gsub("^.*?([1-9][0-9]*\\.).*?$", "\\1", a) End <- gsub("^.*?(\\.[0-9]*[1-9]).*?$", "\\1", a) # regexpr and regmatches (R >= 2.14.0) Front <- regmatches(a, regexpr("[1-9][0-9]*\\.", a)) End <- regmatches(a, regexpr("\\.[0-9]*[1-9]", a)) Front ## [1] "1020." End ## [1] ".9092" -- Noia Raindrops noia.raindrops at gmail.com
You're misreading the docs: from grep, value: if ?FALSE?, a vector containing the (?integer?) indices of the matches determined by ?grep? is returned, and if ?TRUE?, a vector containing the matching elements themselves is returned. Since there's a match somewhere in a[1], all of a[1] is returned (it is a matching element), not just the matching bit: grep(x, value TRUE) is something like x[grepl(x)] to my mind. I think you want ?regexpr or possibly just substitute out the non-match with gsub. Cheers, Michael On Tue, Aug 21, 2012 at 2:24 PM, Dr. Holger van Lishaut <H.v.Lishaut at gmx.de> wrote:> Dear r-help members, > > I have a number in the form of a string, say: > > a<-"-01020.909200" > > I'd like to extract "1020." as well as ".9092" > > Front<-grep(pattern="[1-9]+[0-9]*\\.", value=TRUE, x=a, fixed=FALSE) > End<-grep(pattern="\\.[0-9]*[1-9]+", value=TRUE, x=a, fixed=FALSE) > > However, both strings give "-01020.909200", exactly a. > Could you please point me to what is wrong? > > Thanks and best regards > H. van Lishaut > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
HI, Try this: gsub("^-\\d(\\d{4}.).*","\\1",a) #[1] "1020." gsub("^.*(.\\d{5}).","\\1",a) #[1] ".90920" A.K. ----- Original Message ----- From: Dr. Holger van Lishaut <H.v.Lishaut at gmx.de> To: "r-help at r-project.org" <r-help at r-project.org> Cc: Sent: Tuesday, August 21, 2012 3:24 PM Subject: [R] Regular Expressions in grep Dear r-help members, I have a number in the form of a string, say: a<-"-01020.909200" I'd like to extract "1020." as well as ".9092" Front<-grep(pattern="[1-9]+[0-9]*\\.", value=TRUE, x=a, fixed=FALSE) End<-grep(pattern="\\.[0-9]*[1-9]+", value=TRUE, x=a, fixed=FALSE) However, both strings give "-01020.909200", exactly a. Could you please point me to what is wrong? Thanks and best regards H. van Lishaut ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Dr. Holger van Lishaut
2012-Aug-22 19:46 UTC
[R] Regular Expressions in grep - Solution and function to determine significant figures of a number
Dear all, regmatches works. And, since this has been asked here before: SignifStellen<-function(x){ strx=as.character(x) nchar(regmatches(strx, regexpr("[1-9][0-9]*\\.[0-9]*[1-9]",strx)))-1 } returns the significant figures of a number. Perhaps this can help someone. Thanks & best regards H. van Lishaut
Bert Gunter
2012-Aug-22 19:53 UTC
[R] Regular Expressions in grep - Solution and function to determine significant figures of a number
... On Wed, Aug 22, 2012 at 12:46 PM, Dr. Holger van Lishaut <H.v.Lishaut at gmx.de> wrote:> Dear all, > > regmatches works. > > And, since this has been asked here before: > > SignifStellen<-function(x){ > strx=as.character(x) > nchar(regmatches(strx, regexpr("[1-9][0-9]*\\.[0-9]*[1-9]",strx)))-1 > } > > returns the significant figures of a number. Perhaps this can help someone.except that ?signif already does this, no? -- Bert> > Thanks & best regards > H. van Lishaut-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Dr. Holger van Lishaut
2012-Aug-23 18:43 UTC
[R] Regular Expressions in grep - Solution and function to determine significant figures of a number
Am 22.08.2012, 21:46 Uhr, schrieb Dr. Holger van Lishaut <H.v.Lishaut at gmx.de>:> SignifStellen<-function(x){ > strx=as.character(x) > nchar(regmatches(strx, regexpr("[1-9][0-9]*\\.[0-9]*[1-9]",strx)))-1 > } > > returns the significant figures of a number. Perhaps this can help > someone.Sorry, to work, it must read: SignifStellen<-function(x){ strx=as.character(x) intFront <- nchar(regmatches(strx, regexpr("[1-9][0-9]*\\.", strx))) intEnd <- nchar(regmatches(strx, regexpr("\\.[0-9]*[1-9]", strx))) intFront+intEnd-2 } Best regards H. van Lishaut