All, I am trying to post text from an XLS spread to my wiki and I need to remove any characters that are not UTF-8. Is there an easy gsub command that can do this? (I previously sent this same email to r-sig-gui. That was a mistake and I apologize for the duplication.) Thanks, Roger J. Bos ********************************************************************** * This message is for the named person's use only. It may contain confidential, proprietary or legally privileged information. No right to confidential or privileged treatment of this message is waived or lost by any error in transmission. If you have received this message in error, please immediately notify the sender by e-mail, delete the message and all copies from your system and destroy any hard copies. You must not, directly or indirectly, use, disclose, distribute, print or copy any part of this message if you are not the intended recipient.
Prof Brian Ripley
2007-Oct-26 14:44 UTC
[R] How to remove non-UTF-8 characters from a string
That is not a well-defined concept. To define 'character' you need to know the encoding, since that determines how to split the bytes into characters. So only whole strings can be UTF-8 or not. You can say which bytes in a stream of bytes would be valid in UTF-8, but if not all of them are then almost certainly it would be incorrect to interpret any of them in UTF-8. You can find out if a stream of bytes is valid in a UTF-8 locale by nchar(x, "c", allowNA=TRUE) and testing for NA elements in the result. On Fri, 26 Oct 2007, Bos, Roger wrote:> All, > > I am trying to post text from an XLS spread to my wiki and I need to > remove any characters that are not UTF-8. Is there an easy gsub command > that can do this? > > (I previously sent this same email to r-sig-gui. That was a mistake and > I apologize for the duplication.) > > Thanks, Roger J. Bos-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595