Krishna Dagli/Krushna Dagli
2008-Nov-02 08:43 UTC
[R] R newbie: how to replace string/regular expression
Hello; I am a R newbie and would like to know correct and efficient method for doing string replacement. I have a large data set, where I want to replace character "M", "b", and "K" (currency in Million, Billion and K) to millions. That is 209.7B with (209.7 * 10e6) and 100.00K with (100.00 *1/100) and etc.. d <- c("120.0M", "11.01m", "209.7B", "100.00k", "50") This works that is it removes "b/B", gsub ("(.*)(B$)", "\\1", d, ignore.case=T, perl=T) but gsub ("(.*)(B$)", as.numeric("\\1") * 10e6, d, ignore.case=T, perl=T) does not work. I tried with sprintf and other combination of as.numeric but that fails, how to use \\1 and multiply with 10e6?? The other solution is : location <- grep ("M", d, ignore.case=T) y <- sub("M", "", d, ignore.case=T) y[location]<-y[location] * 10e6 Is the second solution faster or (if) combination of grep along with multiply (if it works) is faster? Or what is the most efficient method to do something like this in R? Thanks and Regards Krishna
Gabor Grothendieck
2008-Nov-02 12:55 UTC
[R] R newbie: how to replace string/regular expression
Your gsub example is almost exactly what gsubfn in the gsubfn package does. gsubfn like gsub except the replacement string is a function:> library(gsubfn) > gsubfn("(.*)B$", ~ as.numeric(x) * 10e6, d, ignore.case = TRUE)[1] "120.0M" "11.01m" "2.097e+09" "100.00k" "50" Also there are examples very similare to this 1. at the end of section 2 of vignette("gsubfn") 2. in demo("gsubfn-si") Also see the gsubfn home page: http://gsubfn.googlecode.com Also note that if you want to return the values rather than transform and reinsert them then strapply in the same package can do that. On Sun, Nov 2, 2008 at 3:43 AM, Krishna Dagli/Krushna Dagli <krishna.dagli at gmail.com> wrote:> Hello; > > I am a R newbie and would like to know correct and efficient method for > doing string replacement. > > I have a large data set, where I want to replace character "M", "b", > and "K" (currency in Million, Billion and K) to millions. That is > 209.7B with (209.7 * 10e6) and 100.00K with (100.00 *1/100) > and etc.. > > d <- c("120.0M", "11.01m", "209.7B", "100.00k", "50") > > This works that is it removes "b/B", > > gsub ("(.*)(B$)", "\\1", d, ignore.case=T, perl=T) > > but > > gsub ("(.*)(B$)", as.numeric("\\1") * 10e6, d, ignore.case=T, perl=T) > > does not work. I tried with sprintf and other combination of as.numeric but > that fails, how to use \\1 and multiply with 10e6?? > > The other solution is : > > location <- grep ("M", d, ignore.case=T) > y <- sub("M", "", d, ignore.case=T) > y[location]<-y[location] * 10e6 > > Is the second solution faster or (if) combination of grep along with > multiply (if it works) is faster? Or what is the most efficient method > to do something like this in R? > > Thanks and Regards > Krishna > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Gabor Grothendieck
2008-Nov-02 14:52 UTC
[R] R newbie: how to replace string/regular expression
There was an error in your regexp which I did not correct. Here it is again corrected to better illustrate the solution:> gsubfn("(.*)B", ~ as.numeric(x) * 10e6, d, ignore.case = TRUE)[1] "120.0M" "11.01m" "2.097e+09" "100.00k" "50" On Sun, Nov 2, 2008 at 7:55 AM, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:> Your gsub example is almost exactly what gsubfn in the gsubfn package > does. gsubfn like gsub except the replacement string is a function: > >> library(gsubfn) >> gsubfn("(.*)B$", ~ as.numeric(x) * 10e6, d, ignore.case = TRUE) > [1] "120.0M" "11.01m" "2.097e+09" "100.00k" "50" > > Also there are examples very similare to this > > 1. at the end of section 2 of > vignette("gsubfn") > > 2. in > demo("gsubfn-si") > > Also see the gsubfn home page: > http://gsubfn.googlecode.com > > Also note that if you want to return the values rather than > transform and reinsert them then strapply in the same package > can do that. > > On Sun, Nov 2, 2008 at 3:43 AM, Krishna Dagli/Krushna Dagli > <krishna.dagli at gmail.com> wrote: >> Hello; >> >> I am a R newbie and would like to know correct and efficient method for >> doing string replacement. >> >> I have a large data set, where I want to replace character "M", "b", >> and "K" (currency in Million, Billion and K) to millions. That is >> 209.7B with (209.7 * 10e6) and 100.00K with (100.00 *1/100) >> and etc.. >> >> d <- c("120.0M", "11.01m", "209.7B", "100.00k", "50") >> >> This works that is it removes "b/B", >> >> gsub ("(.*)(B$)", "\\1", d, ignore.case=T, perl=T) >> >> but >> >> gsub ("(.*)(B$)", as.numeric("\\1") * 10e6, d, ignore.case=T, perl=T) >> >> does not work. I tried with sprintf and other combination of as.numeric but >> that fails, how to use \\1 and multiply with 10e6?? >> >> The other solution is : >> >> location <- grep ("M", d, ignore.case=T) >> y <- sub("M", "", d, ignore.case=T) >> y[location]<-y[location] * 10e6 >> >> Is the second solution faster or (if) combination of grep along with >> multiply (if it works) is faster? Or what is the most efficient method >> to do something like this in R? >> >> Thanks and Regards >> Krishna >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >
Charles C. Berry
2008-Nov-02 21:56 UTC
[R] R newbie: how to replace string/regular expression
Gabor, Why not just this: expos <- list( B="e9", M="e6", m="e6", k="e3" ) as.numeric( gsubfn("[[:alpha:]]", expos, d ) ) HTH, Chuck p.s. I am not sure why B goes with e6 or K with e-02 (below), but Krishna can adjust the values accordingly. On Sun, 2 Nov 2008, Gabor Grothendieck wrote:> There was an error in your regexp which I did not correct. Here it is > again corrected to better illustrate the solution: > >> gsubfn("(.*)B", ~ as.numeric(x) * 10e6, d, ignore.case = TRUE) > [1] "120.0M" "11.01m" "2.097e+09" "100.00k" "50" > > On Sun, Nov 2, 2008 at 7:55 AM, Gabor Grothendieck > <ggrothendieck at gmail.com> wrote: >> Your gsub example is almost exactly what gsubfn in the gsubfn package >> does. gsubfn like gsub except the replacement string is a function: >> >>> library(gsubfn) >>> gsubfn("(.*)B$", ~ as.numeric(x) * 10e6, d, ignore.case = TRUE) >> [1] "120.0M" "11.01m" "2.097e+09" "100.00k" "50" >> >> Also there are examples very similare to this >> >> 1. at the end of section 2 of >> vignette("gsubfn") >> >> 2. in >> demo("gsubfn-si") >> >> Also see the gsubfn home page: >> http://gsubfn.googlecode.com >> >> Also note that if you want to return the values rather than >> transform and reinsert them then strapply in the same package >> can do that. >> >> On Sun, Nov 2, 2008 at 3:43 AM, Krishna Dagli/Krushna Dagli >> <krishna.dagli at gmail.com> wrote: >>> Hello; >>> >>> I am a R newbie and would like to know correct and efficient method for >>> doing string replacement. >>> >>> I have a large data set, where I want to replace character "M", "b", >>> and "K" (currency in Million, Billion and K) to millions. That is >>> 209.7B with (209.7 * 10e6) and 100.00K with (100.00 *1/100) >>> and etc.. >>> >>> d <- c("120.0M", "11.01m", "209.7B", "100.00k", "50") >>> >>> This works that is it removes "b/B", >>> >>> gsub ("(.*)(B$)", "\\1", d, ignore.case=T, perl=T) >>> >>> but >>> >>> gsub ("(.*)(B$)", as.numeric("\\1") * 10e6, d, ignore.case=T, perl=T) >>> >>> does not work. I tried with sprintf and other combination of as.numeric but >>> that fails, how to use \\1 and multiply with 10e6?? >>> >>> The other solution is : >>> >>> location <- grep ("M", d, ignore.case=T) >>> y <- sub("M", "", d, ignore.case=T) >>> y[location]<-y[location] * 10e6 >>> >>> Is the second solution faster or (if) combination of grep along with >>> multiply (if it works) is faster? Or what is the most efficient method >>> to do something like this in R? >>> >>> Thanks and Regards >>> Krishna >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
Gabor Grothendieck
2008-Nov-02 22:06 UTC
[R] R newbie: how to replace string/regular expression
I did provide a link to that solution already but also wanted to show how to do it in the same way that the code in the question was written. On Sun, Nov 2, 2008 at 4:56 PM, Charles C. Berry <cberry at tajo.ucsd.edu> wrote:> > > > Gabor, > > Why not just this: > > expos <- list( B="e9", M="e6", m="e6", k="e3" ) > as.numeric( gsubfn("[[:alpha:]]", expos, d ) ) > > HTH, > > Chuck > > p.s. I am not sure why B goes with e6 or K with e-02 (below), but Krishna > can adjust the values accordingly. > > > On Sun, 2 Nov 2008, Gabor Grothendieck wrote: > >> There was an error in your regexp which I did not correct. Here it is >> again corrected to better illustrate the solution: >> >>> gsubfn("(.*)B", ~ as.numeric(x) * 10e6, d, ignore.case = TRUE) >> >> [1] "120.0M" "11.01m" "2.097e+09" "100.00k" "50" >> >> On Sun, Nov 2, 2008 at 7:55 AM, Gabor Grothendieck >> <ggrothendieck at gmail.com> wrote: >>> >>> Your gsub example is almost exactly what gsubfn in the gsubfn package >>> does. gsubfn like gsub except the replacement string is a function: >>> >>>> library(gsubfn) >>>> gsubfn("(.*)B$", ~ as.numeric(x) * 10e6, d, ignore.case = TRUE) >>> >>> [1] "120.0M" "11.01m" "2.097e+09" "100.00k" "50" >>> >>> Also there are examples very similare to this >>> >>> 1. at the end of section 2 of >>> vignette("gsubfn") >>> >>> 2. in >>> demo("gsubfn-si") >>> >>> Also see the gsubfn home page: >>> http://gsubfn.googlecode.com >>> >>> Also note that if you want to return the values rather than >>> transform and reinsert them then strapply in the same package >>> can do that. >>> >>> On Sun, Nov 2, 2008 at 3:43 AM, Krishna Dagli/Krushna Dagli >>> <krishna.dagli at gmail.com> wrote: >>>> >>>> Hello; >>>> >>>> I am a R newbie and would like to know correct and efficient method for >>>> doing string replacement. >>>> >>>> I have a large data set, where I want to replace character "M", "b", >>>> and "K" (currency in Million, Billion and K) to millions. That is >>>> 209.7B with (209.7 * 10e6) and 100.00K with (100.00 *1/100) >>>> and etc.. >>>> >>>> d <- c("120.0M", "11.01m", "209.7B", "100.00k", "50") >>>> >>>> This works that is it removes "b/B", >>>> >>>> gsub ("(.*)(B$)", "\\1", d, ignore.case=T, perl=T) >>>> >>>> but >>>> >>>> gsub ("(.*)(B$)", as.numeric("\\1") * 10e6, d, ignore.case=T, perl=T) >>>> >>>> does not work. I tried with sprintf and other combination of as.numeric >>>> but >>>> that fails, how to use \\1 and multiply with 10e6?? >>>> >>>> The other solution is : >>>> >>>> location <- grep ("M", d, ignore.case=T) >>>> y <- sub("M", "", d, ignore.case=T) >>>> y[location]<-y[location] * 10e6 >>>> >>>> Is the second solution faster or (if) combination of grep along with >>>> multiply (if it works) is faster? Or what is the most efficient method >>>> to do something like this in R? >>>> >>>> Thanks and Regards >>>> Krishna >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > Charles C. Berry (858) 534-2098 > Dept of Family/Preventive > Medicine > E mailto:cberry at tajo.ucsd.edu UC San Diego > http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 > > >
I would read up on the 'gsub' command in R help. It does what you would like. -- View this message in context: http://r.789695.n4.nabble.com/R-newbie-how-to-replace-string-regular-expression-tp873169p3390170.html Sent from the R help mailing list archive at Nabble.com.