Hello all, I have a vector of character strings, in which I have letters, numbers, and symbols. What I wish to do is obtain a vector of the same length with just the numbers. A quick example - extract of the original vector : "lema, rb 2%" "rb 2%" "rb 3%" "rb 4%" "rb 3%" "rb 2%,mineuse" "rb" "rb" "rb 12" "rb" "rj 30%" "rb" "rb" "rb 25%" "rb" "rb" "rb" "rj, rb" and the type of thing I wish to end up with : "2" "2" "3" "4" "3" "2" "" "" "12" "" "30" "" "" "25" "" "" "" "" or, instead of "", NA would be acceptable (actually it would almost be better for me) Anyways, I've been battling with gsub() and things of the sort, but I'm drowning in the regular expressions, despite a few hours of looking at Perl tutorials... So if anyone can help me out, it would be greatly appreciated!! In advance, thanks very much. David Gouache Arvalis - Institut du V?g?tal Station de La Mini?re 78280 Guyancourt Tel: 01.30.12.96.22 / Port: 06.86.08.94.32
Bonjour David, What about one of these : R> gsub( "[^[:digit:]]", "", x ) or using perl regular expressions: R> gsub( "\\D", "", x, perl = T ) Cheers, Romain GOUACHE David wrote:> Hello all, > > I have a vector of character strings, in which I have letters, numbers, and symbols. What I wish to do is obtain a vector of the same length with just the numbers. > A quick example - > > extract of the original vector : > "lema, rb 2%" "rb 2%" "rb 3%" "rb 4%" "rb 3%" "rb 2%,mineuse" "rb" "rb" "rb 12" "rb" "rj 30%" "rb" "rb" "rb 25%" "rb" "rb" "rb" "rj, rb" > > and the type of thing I wish to end up with : > "2" "2" "3" "4" "3" "2" "" "" "12" "" "30" "" "" "25" "" "" "" "" > > or, instead of "", NA would be acceptable (actually it would almost be better for me) > > Anyways, I've been battling with gsub() and things of the sort, but I'm drowning in the regular expressions, despite a few hours of looking at Perl tutorials... > So if anyone can help me out, it would be greatly appreciated!! > > In advance, thanks very much. > > David Gouache > Arvalis - Institut du V?g?tal > Station de La Mini?re > 78280 Guyancourt > Tel: 01.30.12.96.22 / Port: 06.86.08.94.3-- Mango Solutions data analysis that delivers Tel: +44(0) 1249 467 467 Fax: +44(0) 1249 467 468 Mob: +44(0) 7813 526 123
Is this what you want:> x[1] "lema, rb 2%" "rb 2%" "rb 3%" "rb 4%" "rb 3%" "rb 2%,mineuse" [7] "rb" "rb" "rb 12" "rb" "rj 30%" "rb" [13] "rb" "rb 25%" "rb" "rb" "rb" "rj, rb"> gsub("[^0-9]*([0-9]*)[^0-9]*", "\\1", x)[1] "2" "2" "3" "4" "3" "2" "" "" "12" "" "30" "" "" "25" "" "" "" "">On 7/30/07, GOUACHE David <D.GOUACHE at arvalisinstitutduvegetal.fr> wrote:> Hello all, > > I have a vector of character strings, in which I have letters, numbers, and symbols. What I wish to do is obtain a vector of the same length with just the numbers. > A quick example - > > extract of the original vector : > "lema, rb 2%" "rb 2%" "rb 3%" "rb 4%" "rb 3%" "rb 2%,mineuse" "rb" "rb" "rb 12" "rb" "rj 30%" "rb" "rb" "rb 25%" "rb" "rb" "rb" "rj, rb" > > and the type of thing I wish to end up with : > "2" "2" "3" "4" "3" "2" "" "" "12" "" "30" "" "" "25" "" "" "" "" > > or, instead of "", NA would be acceptable (actually it would almost be better for me) > > Anyways, I've been battling with gsub() and things of the sort, but I'm drowning in the regular expressions, despite a few hours of looking at Perl tutorials... > So if anyone can help me out, it would be greatly appreciated!! > > In advance, thanks very much. > > David Gouache > Arvalis - Institut du V?g?tal > Station de La Mini?re > 78280 Guyancourt > Tel: 01.30.12.96.22 / Port: 06.86.08.94.32 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
GOUACHE David wrote:> > Hello all, > > I have a vector of character strings, in which I have letters, numbers, > and symbols. What I wish to do is obtain a vector of the same length with > just the numbers. > A quick example - > > extract of the original vector : > "lema, rb 2%" "rb 2%" "rb 3%" "rb 4%" "rb 3%" "rb 2%,mineuse" "rb" "rb" > "rb 12" "rb" "rj 30%" "rb" "rb" "rb 25%" "rb" "rb" "rb" "rj, rb" > > and the type of thing I wish to end up with : > "2" "2" "3" "4" "3" "2" "" "" "12" "" "30" "" "" "25" "" "" "" "" > > or, instead of "", NA would be acceptable (actually it would almost be > better for me) >> chv<-scan(what="character",sep=" ") #then copy the text from your message > to the clipboard and paste it to the R console > chv[1] "lema, rb 2%" "rb 2%" "rb 3%" "rb 4%" [5] "rb 3%" "rb 2%,mineuse" "rb" "rb" [9] "rb 12" "rb" "rj 30%" "rb" [13] "rb" "rb 25%" "rb" "rb" [17] "rb" "rj, rb" # actual replacements : # replace non-digits with nothing> chv.digits<-gsub("[^0-9]","",chv) > chv.digits[1] "2" "2" "3" "4" "3" "2" "" "" "12" "" "30" "" "" "25" "" [16] "" "" "" # replace empty strings with NA> chv.digits[chv.digits==""]<-NA > chv.digits[1] "2" "2" "3" "4" "3" "2" NA NA "12" NA "30" NA NA "25" NA [16] NA NA NA -- View this message in context: http://www.nabble.com/regular-expressions-%3A-extracting-numbers-tf4169660.html#a11862597 Sent from the R help mailing list archive at Nabble.com.
On Mon, 2007-07-30 at 13:58 +0200, GOUACHE David wrote:> Hello all, > > I have a vector of character strings, in which I have letters, > numbers, and symbols. What I wish to do is obtain a vector of the same > length with just the numbers. > A quick example - > > extract of the original vector : > "lema, rb 2%" "rb 2%" "rb 3%" "rb 4%" "rb 3%" "rb 2%,mineuse" "rb" > "rb" "rb 12" "rb" "rj 30%" "rb" "rb" "rb 25%" "rb" "rb" "rb" "rj, rb" > > and the type of thing I wish to end up with : > "2" "2" "3" "4" "3" "2" "" "" "12" "" "30" "" "" "25" "" "" "" "" > > or, instead of "", NA would be acceptable (actually it would almost be > better for me) > > Anyways, I've been battling with gsub() and things of the sort, but > I'm drowning in the regular expressions, despite a few hours of > looking at Perl tutorials... > So if anyone can help me out, it would be greatly appreciated!! > > In advance, thanks very much.Try this:> Vec[1] "lema, rb 2%" "rb 2%" "rb 3%" "rb 4%" [5] "rb 3%" "rb 2%,mineuse" "rb" "rb" [9] "rb 12" "rb" "rj 30%" "rb" [13] "rb" "rb 25%" "rb" "rb" [17] "rb" "rj, rb"> gsub("[^0-9]", "", Vec)[1] "2" "2" "3" "4" "3" "2" "" "" "12" "" "30" "" "" [14] "25" "" "" "" "" The search pattern regex here is [^0-9] which says to replace anything that is not (^) in the character range of 0 through 9. See ?regex and/or http://www.regular-expressions.info/ HTH, Marc Schwartz
Dear David, does the following work for you? sVec <- c("lema, rb 2%", "rb 2%", "rb 3%", "rb 4%", "rb 3%", "rb 2%,mineuse", "rb", "rb", "rb 12", "rb", "rj 30%", "rb", "rb", "rb 25%", "rb", "rb", "rb", "rj, rb") reVec <- regexpr("[[:digit:]]+", sVec) # see ?regex for details on '[:digit:]' and '+' substr(sVec ,start = reVec, stop=reVec + attr(reVec, "match.length") - 1) # see ?substr for details Christian
> gsub(" ", "", gsub("%", "", gsub("[a-z]", "", c("tr3","jh40%qs dqd"))))[1] "3" "40" Jacques VESLOT INRA - Biostatistique & Processus Spatiaux Site Agroparc 84914 Avignon Cedex 9, France Tel: +33 (0) 4 32 72 21 58 Fax: +33 (0) 4 32 72 21 84 GOUACHE David a ?crit :> Hello all, > > I have a vector of character strings, in which I have letters, numbers, and symbols. What I wish to do is obtain a vector of the same length with just the numbers. > A quick example - > > extract of the original vector : > "lema, rb 2%" "rb 2%" "rb 3%" "rb 4%" "rb 3%" "rb 2%,mineuse" "rb" "rb" "rb 12" "rb" "rj 30%" "rb" "rb" "rb 25%" "rb" "rb" "rb" "rj, rb" > > and the type of thing I wish to end up with : > "2" "2" "3" "4" "3" "2" "" "" "12" "" "30" "" "" "25" "" "" "" "" > > or, instead of "", NA would be acceptable (actually it would almost be better for me) > > Anyways, I've been battling with gsub() and things of the sort, but I'm drowning in the regular expressions, despite a few hours of looking at Perl tutorials... > So if anyone can help me out, it would be greatly appreciated!! > > In advance, thanks very much. > > David Gouache > Arvalis - Institut du V?g?tal > Station de La Mini?re > 78280 Guyancourt > Tel: 01.30.12.96.22 / Port: 06.86.08.94.32 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
I assume if you want the "" components to be NA then you really intend the result to be a numeric vector. The following replaces all non-digits with "" (thereby removing them) and then uses as.numeric to convert the result to numeric. Just omit the conversion if you want a character vector result: s <- c("lema, rb 2%", "rb 2%", "rb 3%", "rb 4%", "rb 3%", "rb 2%,mineuse", "rb", "rb", "rb 12", "rb", "rj 30%", "rb", "rb", "rb 25%", "rb", "rb", "rb", "rj, rb") as.numeric(gsub("[^[:digit:]]+", "", s)) On 7/30/07, GOUACHE David <D.GOUACHE at arvalisinstitutduvegetal.fr> wrote:> Hello all, > > I have a vector of character strings, in which I have letters, numbers, and symbols. What I wish to do is obtain a vector of the same length with just the numbers. > A quick example - > > extract of the original vector : > "lema, rb 2%" "rb 2%" "rb 3%" "rb 4%" "rb 3%" "rb 2%,mineuse" "rb" "rb" "rb 12" "rb" "rj 30%" "rb" "rb" "rb 25%" "rb" "rb" "rb" "rj, rb" > > and the type of thing I wish to end up with : > "2" "2" "3" "4" "3" "2" "" "" "12" "" "30" "" "" "25" "" "" "" "" > > or, instead of "", NA would be acceptable (actually it would almost be better for me) > > Anyways, I've been battling with gsub() and things of the sort, but I'm drowning in the regular expressions, despite a few hours of looking at Perl tutorials... > So if anyone can help me out, it would be greatly appreciated!! > > In advance, thanks very much. > > David Gouache > Arvalis - Institut du V?g?tal > Station de La Mini?re > 78280 Guyancourt > Tel: 01.30.12.96.22 / Port: 06.86.08.94.32 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
This might work:> numOnly <- function(x) gsub("[^0-9]", "", x) > numOnly("lema, rb 2%")[1] "2"> numOnly("rb")[1] "" Max -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of GOUACHE David Sent: Monday, July 30, 2007 7:59 AM To: r-help at stat.math.ethz.ch Subject: [R] regular expressions : extracting numbers Hello all, I have a vector of character strings, in which I have letters, numbers, and symbols. What I wish to do is obtain a vector of the same length with just the numbers. A quick example - extract of the original vector : "lema, rb 2%" "rb 2%" "rb 3%" "rb 4%" "rb 3%" "rb 2%,mineuse" "rb" "rb" "rb 12" "rb" "rj 30%" "rb" "rb" "rb 25%" "rb" "rb" "rb" "rj, rb" and the type of thing I wish to end up with : "2" "2" "3" "4" "3" "2" "" "" "12" "" "30" "" "" "25" "" "" "" "" or, instead of "", NA would be acceptable (actually it would almost be better for me) Anyways, I've been battling with gsub() and things of the sort, but I'm drowning in the regular expressions, despite a few hours of looking at Perl tutorials... So if anyone can help me out, it would be greatly appreciated!! In advance, thanks very much. David Gouache Arvalis - Institut du V?g?tal Station de La Mini?re 78280 Guyancourt Tel: 01.30.12.96.22 / Port: 06.86.08.94.32 ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ---------------------------------------------------------------------- LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}}