Hi, I have what I hope is a simple text processing question in R. I want to replace every instance of http:\\XXX.com with "WEBSITE" When I try sub('(^http://)(.com$)', 'WEBSITE', <filename>);, it only substitutes http:// and .com so it looks like WEBSITEXXXWEBSITE How do I get it to match the pattern "http:// . . . . .com" and substitute the whole phrase? Thanks in advance! -- View this message in context: http://r.789695.n4.nabble.com/Replacing-a-string-tp4648368.html Sent from the R help mailing list archive at Nabble.com.
You want to read the ?regexp page and try gsub("^http://.*\\.com$", "WEBSITE", <filename>) Michael On Sun, Nov 4, 2012 at 4:09 AM, Allie818 <alice.ly at gmail.com> wrote:> Hi, > > I have what I hope is a simple text processing question in R. > > I want to replace every instance of http:\\XXX.com with "WEBSITE" > > When I try > sub('(^http://)(.com$)', 'WEBSITE', <filename>);, > it only substitutes http:// and .com so it looks like > WEBSITEXXXWEBSITE > > How do I get it to match the pattern > "http:// . . . . .com" and substitute the whole phrase? > > Thanks in advance! > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Replacing-a-string-tp4648368.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Thanks so much Arun! It's the second case. Being able to extract is really powerful too. Thank you for sharing that as well! Sent from my iPad On Nov 4, 2012, at 12:00 AM, "arun kirshna [via R]" <ml-node+s789695n4648372h32@n4.nabble.com> wrote:> HI, > > I am not sure how you want your output. > Is it something like: > "WEBSITEWEBSITEWEBSITE" > #or > just "WEBSITE" replacing the whole url > > I guess it is the former. > url1<-"http:\\XXX.com" > gsub(".*","WEBSITEWEBSITEWEBSITE",url1) > #[1] "WEBSITEWEBSITEWEBSITE" > #2nd case > gsub(".*","WEBSITE",url1) > #[1] "WEBSITE" > > But, if you wanted to extract the 1st (http), 2nd (XXX), and 3rd (com) components: > gsub("\\\\","",gsub("(.*)\\:(.*)\\.(.*)","\\1 \\2 \\3",url1)) > #[1] "http XXX com" > #or > gsub("(.*)\\:\\\\(.*)\\.(.*)","\\1 \\2 \\3",url1) > #[1] "http XXX com" > #just the first component > gsub("(.*)\\:.*","\\1",url1) > #[1] "http" > #second alone > gsub(".*\\:\\\\(.*)\\..*","\\1",url1) > #[1] "XXX" > A.K. > > > > > > > > If you reply to this email, your message will be added to the discussion below: > http://r.789695.n4.nabble.com/Replacing-a-string-tp4648368p4648372.html > To unsubscribe from Replacing a string, click here. > NAML-- View this message in context: http://r.789695.n4.nabble.com/Replacing-a-string-tp4648368p4648415.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]]