On Thu, Jun 19, 2008 at 2:17 PM, ppatel3026
<pratik.patel at us.rothschild.com> wrote:>
> I would like to replace "\r\n" with "" in a character
string, where "\r\n"
> exists only between < and >, how could I do that?
>
> Initial:
> characterString = "<XML><tag1
>
id=\"F\r\n2\"></t\r\nag1>\r\n<tag\r\n2></tag2></XML>"
>
> Result:
> characterString = "<XML><tag1
id=\"F2\"></tag1>\r\n<tag2></tag2></XML>"
>
> Tried with sub(below) but it only replaces the first instance and I am not
> sure how to pattern match so that it only replaces \r\n that exist within
> tags(< and >).
>
> sub("\r\n", "", charStream)
I assume you want to delete all \r and all \n in tags and not
just \r\n but if its just \r\n then just modify the 2nd regular expression
appropriately and the rest should work the same.
gsubfn from the package of the same name
is like gsub except instead of replacing each occurrence of
the regular expression with a fixed string it feeds each match
into the function specified as arg2 and replaces the match
with the output of that function. The function can alternately
be specified as a formula, as it is here, in which case the
right side of the formula specifies the function body and the
formal arguments of the function are constructed from the
free variables, in this case just x. See gsubfn home page at
http://gsubfn.googlecode.com .
characterString <-
"<XML><tag1
id=\"F\r\n2\"></t\r\nag1>\r\n<tag\r\n2></tag2></XML>"
library(gsubfn)
gsubfn("<[^>]*>", ~ gsub("[\r\n]", "",
x), characterString)