Mauricio Cornejo
2012-Nov-21 21:41 UTC
[R] Problems understanding use of regular expression (in gsub) for manipulating currency
Hello, After reading help file, various threads on this board, and other online tutorials, I've attempted to use gsub (using Perl-like syntax) to change a currency string into something that can be converted to numeric type using only one regular expression. Can anybody point out my error? Note that> x <- "\"$ 1,200,300,400.50\""Tried the following in an attempt to arrive at "1200300400.50"> gsub("(^[\\D]*)(([\\d]*)[,])*([\\d]*[.]*[\\d]*)([\\D]*)", "\\3\\4", x, perl=TRUE)[1] "300400.50" Note that "\d" matches a digit character and "\D" matches a non-digit character. Results group "\2" was intentionally omitted from the replacement pattern as it would have included commas. I was expecting multiple results for group "\3" Many thanks, Mauricio [[alternative HTML version deleted]]
David Winsemius
2012-Nov-22 01:36 UTC
[R] Problems understanding use of regular expression (in gsub) for manipulating currency
On Nov 21, 2012, at 1:41 PM, Mauricio Cornejo wrote:> Hello, > > After reading help file, various threads on this board, and other online tutorials, I've attempted to use gsub (using Perl-like syntax) to change a currency string into something that can be converted to numeric type using only one regular expression. Can anybody point out my error? Note that > > >> x <- "\"$ 1,200,300,400.50\"" > > Tried the following in an attempt to arrive at "1200300400.50" > >> gsub("(^[\\D]*)(([\\d]*)[,])*([\\d]*[.]*[\\d]*)([\\D]*)", "\\3\\4", x, perl=TRUE) > [1] "300400.50" > > Note that "\d" matches a digit character and "\D" matches a non-digit character. > Results group "\2" was intentionally omitted from the replacement pattern as it would have included commas.> gsub("[,\"]", "", gsub("^\\D*(\\d.*)", "\\1",x, perl=TRUE) )[1] "1200300400.50" I have my doubts about the "\"..." construction. I suspect it stems from your not understanding the conventaion used in printing escpae characters in R. -- David Winsemius, MD Alameda, CA, USA
Mauricio Cornejo
2012-Nov-22 03:27 UTC
[R] Problems understanding use of regular expression (in gsub) for manipulating currency
Arun, thanks for both of your suggestions. I played with your second idea some more and seemed to have found a more general solution (If not a digit nor period then replace with blank). x <- "\"$ 1,200,300,400.50\"" gsub("[^0-9|.]", "", x, perl=TRUE) [1] "1200300400.50" Regards, Mauricio ________________________________ Cc: R help <r-help@r-project.org> Sent: Wednesday, November 21, 2012 5:26 PM Subject: Re: [R] Problems understanding use of regular expression (in gsub) for manipulating currency HI, One more method: x <- "\"$ 1,200,300,400.50\"" gsub("[\",$ ]","",x) #[1] "1200300400.50" A.K. ----- Original Message ----- To: "r-help@r-project.org" <r-help@r-project.org> Cc: Sent: Wednesday, November 21, 2012 4:41 PM Subject: [R] Problems understanding use of regular expression (in gsub) for manipulating currency Hello, After reading help file, various threads on this board, and other online tutorials, I've attempted to use gsub (using Perl-like syntax) to change a currency string into something that can be converted to numeric type using only one regular expression. Can anybody point out my error? Note that> x <- "\"$ 1,200,300,400.50\""Tried the following in an attempt to arrive at "1200300400.50"> gsub("(^[\\D]*)(([\\d]*)[,])*([\\d]*[.]*[\\d]*)([\\D]*)", "\\3\\4", x, perl=TRUE)[1] "300400.50" Note that "\d" matches a digit character and "\D" matches a non-digit character. Results group "\2" was intentionally omitted from the replacement pattern as it would have included commas. I was expecting multiple results for group "\3" Many thanks, Mauricio [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
jim holtman
2012-Nov-22 03:27 UTC
[R] Problems understanding use of regular expression (in gsub) for manipulating currency
Here is another approach:> x <- "\"$ 1,200,300,400.50\"" > x[1] "\"$ 1,200,300,400.50\""> gsub("[,$ \"]", "", x)[1] "1200300400.50"> as.numeric(gsub("[,$ \"]", "", x))[1] 1200300401>On Wed, Nov 21, 2012 at 4:41 PM, Mauricio Cornejo <mauriciocornejo at yahoo.com> wrote:> x <- "\"$ 1,200,300,400.50\""-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.