I think you mean the control characters: there are other unprintable
characters (del for example). They are the character range
[\001-\037]. E.g.
> test <- intToUtf8(1:40, multiple=TRUE)
> grepl("[\001-\037]", test)
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[13] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[25] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
[37] FALSE FALSE FALSE FALSE
If you want to include del, use "[\001-\037\177]". I have omitted nul
(\000) which cannot occur in R character strings.
You didn't give us the sessionInfo() output the posting guide asked
you for, so I am presuming you are not doing this in an unusual
locale: I wouldn't trust the regexp code in one of the stateful
locales used for Japanese.
On Wed, 25 Nov 2009, Steven Kang wrote:
> Hi all,
>
> I have a csv file containing words with *UNPRINTABLE ASCII* characters
> (described in the following table).
>
> Are there any viable method in eliminating these characters?
>
> I realise that *EXTENDED ASCII* characters (i.e , ?, ?, ?, ? etc) can be
> removed or replaced via *"gsub"* or *"gregexpr"*
functions. But am not
> certain with the *UNPRINTABLE ASCII* characters.
>
> Your help in resolving this problem would be highly appreciated.
>
> Thanks
>
>
>
>
> Steven
>
>
>
>
> ASCII control characters (character code 0-31)The first 32 characters in
> the ASCII-table are unprintable control codes and are used to control
> peripherals such as printers.
> *DEC* *OCT* *HEX* *BIN* *Symbol* *HTML Number* *HTML Name* *Description*
> 0 000 00 00000000 NUL � Null char 1 001 01 00000001 SOH
 Start
> of Heading 2 002 02 00000010 STX  Start of Text 3 003 03
00000011
> ETX  End of Text 4 004 04 00000100 EOT  End of
Transmission
> 5 005 05 00000101 ENQ  Enquiry 6 006 06 00000110 ACK 
> Acknowledgment 7 007 07 00000111 BEL  Bell 8 010 08 00001000 BS
>  Back Space 9 011 09 00001001 HT 	 Horizontal Tab 10
012 0A
> 00001010 LF 
 Line Feed 11 013 0B 00001011 VT 
Vertical Tab
> 12 014 0C 00001100 FF  Form Feed 13 015 0D 00001101 CR

> Carriage
> Return 14 016 0E 00001110 SO  Shift Out / X-On 15 017 0F
00001111 SI
>  Shift In / X-Off 16 020 10 00010000 DLE  Data Line
Escape
> 17 021 11 00010001 DC1  Device Control 1 (oft. XON) 18 022 12
> 00010010 DC2  Device Control 2 19 023 13 00010011 DC3
 Device
> Control 3 (oft. XOFF) 20 024 14 00010100 DC4  Device Control 4
21
> 025 15 00010101 NAK  Negative Acknowledgement 22 026 16
00010110 SYN
>  Synchronous Idle 23 027 17 00010111 ETB  End of
Transmit
> Block 24 030 18 00011000 CAN  Cancel 25 031 19 00011001 EM
 End
> of Medium 26 032 1A 00011010 SUB  Substitute 27 033 1B 00011011
ESC
>  Escape 28 034 1C 00011100 FS  File Separator 29
035 1D
> 00011101 GS  Group Separator 30 036 1E 00011110 RS 
Record
> Separator 31 037 1F 00011111 US  Unit Separator
>
> [[alternative HTML version deleted]]
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595