Dear experts in regexpr. I have this dput(test[500:510]) c("pH 9,36 2", "pH 9,36 3", "pH 9,66 1", "pH 9,66 2", "pH 9,66 3", "pH 10,04 1", "pH 10,04 2", "pH 10,04 3", "RGLP 144006 pH 6,13 1", "RGLP 144006 pH 6,13 2", "RGLP 144006 pH 6,13 3") and I want something like this gsub("^.*([[:digit:]],[[:digit:]]*).*$", "\\1", test[500:510]) [1] "9,36" "9,36" "9,66" "9,66" "9,66" "0,04" "0,04" "0,04" "6,13" "6,13" [11] "6,13" but with 10,04 values instead of 0,04. I tried gsub("^.*([[:digit:]]+,[[:digit:]]*).*$", "\\1", test[500:510]) or other variations but without any success. Please help. Regards Petr
On Jul 9, 2013, at 11:45 , PIKAL Petr wrote:> Dear experts in regexpr. > > I have this > > dput(test[500:510]) > c("pH 9,36 2", "pH 9,36 3", "pH 9,66 1", "pH 9,66 2", "pH 9,66 3", > "pH 10,04 1", "pH 10,04 2", "pH 10,04 3", "RGLP 144006 pH 6,13 1", > "RGLP 144006 pH 6,13 2", "RGLP 144006 pH 6,13 3") > > and I want something like this > > gsub("^.*([[:digit:]],[[:digit:]]*).*$", "\\1", test[500:510]) > [1] "9,36" "9,36" "9,66" "9,66" "9,66" "0,04" "0,04" "0,04" "6,13" "6,13" > [11] "6,13" > > but with 10,04 values instead of 0,04. > > I tried > gsub("^.*([[:digit:]]+,[[:digit:]]*).*$", "\\1", test[500:510]) > > or other variations but without any success. >Presumably the ^.* is too greedy. Perhaps add a space? I.e., gsub("^.* ([[:di...... -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
On Tue, Jul 09, 2013 at 09:45:55AM +0000, PIKAL Petr wrote:> Dear experts in regexpr. > > I have this > > dput(test[500:510]) > c("pH 9,36 2", "pH 9,36 3", "pH 9,66 1", "pH 9,66 2", "pH 9,66 3", > "pH 10,04 1", "pH 10,04 2", "pH 10,04 3", "RGLP 144006 pH 6,13 1", > "RGLP 144006 pH 6,13 2", "RGLP 144006 pH 6,13 3") > > and I want something like this > > gsub("^.*([[:digit:]],[[:digit:]]*).*$", "\\1", test[500:510]) > [1] "9,36" "9,36" "9,66" "9,66" "9,66" "0,04" "0,04" "0,04" "6,13" "6,13" > [11] "6,13" > > but with 10,04 values instead of 0,04. > > I tried > gsub("^.*([[:digit:]]+,[[:digit:]]*).*$", "\\1", test[500:510]) > > or other variations but without any success. > > Please help.The "1" in "10,04" is matched by ".*". In your example, all floating comma numbers you're trying to extract are preceded by "pH ", so replacing ".*" with ".*pH " should do what you want. I'd be wary about that variation of having "RGLP 144006" in some cases, though, it might be better to clean up this rubbish earlier on (and it would be ideal to never have it generated in the first place). Regular expressions can be useful to separate some chaff from the wheat, but relying on that too much comes with a risk of extracting something that is valid in some syntactic / technical sense but not correct semantically. If you can't be 100% certain that the number you want is (1) always preceded by "pH ", (2) always a floating comma number and (3) will always contain an integer and a fractional part (i.e. you'll never get ",09" rather than "0,09", or "10" rather than "10,0"), you have to be prepared for more difficulties, and you may want to consider a more systematic approach to parsing your input. Best regards, Jan -- +- Jan T. Kim -------------------------------------------------------+ | email: jttkim at gmail.com | | WWW: http://www.jtkim.dreamhosters.com/ | *-----=< hierarchical systems are for files, not for humans >=-----*
Hi, May be this helps: ? gsub(".*\\w+\\s+(.*)\\s+.*","\\1",test) ?#[1] "9,36"? "9,36"? "9,66"? "9,66"? "9,66"? "10,04" "10,04" "10,04" "6,13" #[10] "6,13"? "6,13" A.K. ----- Original Message ----- From: PIKAL Petr <petr.pikal at precheza.cz> To: r-help <r-help at r-project.org> Cc: Sent: Tuesday, July 9, 2013 5:45 AM Subject: [R] regular expression strikes again Dear experts in regexpr. I have this dput(test[500:510]) c("pH 9,36 2", "pH 9,36 3", "pH 9,66 1", "pH 9,66 2", "pH 9,66 3", "pH 10,04 1", "pH 10,04 2", "pH 10,04 3", "RGLP 144006 pH 6,13 1", "RGLP 144006 pH 6,13 2", "RGLP 144006 pH 6,13 3") and I want something like this gsub("^.*([[:digit:]],[[:digit:]]*).*$", "\\1", test[500:510]) [1] "9,36" "9,36" "9,66" "9,66" "9,66" "0,04" "0,04" "0,04" "6,13" "6,13" [11] "6,13" but with 10,04 values instead of 0,04. I tried gsub("^.*([[:digit:]]+,[[:digit:]]*).*$", "\\1", test[500:510]) or other variations but without any success. Please help. Regards Petr ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.