thr3ads.net - R help - [R] multiple separators in sep argument for read.table? [Apr 2008]

If this information is useful, please help other people find it:
Share via:

Johan Jackson

2008-Apr-19 06:19 UTC

[R] multiple separators in sep argument for read.table?

Hello,

Is there any way to add multiple separators in the sep= argument in
read.table? I would like to be able to create different columns if I see a
white space OR a "/".

Thanks in advance,

JJ

	[[alternative HTML version deleted]]

Prof Brian Ripley

2008-Apr-19 06:38 UTC

head link

[R] multiple separators in sep argument for read.table?

On Sat, 19 Apr 2008, Johan Jackson wrote:
> Hello,
>
> Is there any way to add multiple separators in the sep= argument in
> read.table? I would like to be able to create different columns if I see a
> white space OR a "/".
No.  read.table() uses scan(), and that requires 'sep' to be a single 
character (if specified).

You can read your dataset by readLines, change "/" to, say,
"\t" by gsub()
and then use read.table() on a textConnection() from the resulting 
character vector.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

(Ted Harding)

2008-Apr-19 07:52 UTC

head link

[R] multiple separators in sep argument for read.table?

On 19-Apr-08 06:19:09, Johan Jackson wrote:> Hello,
> Is there any way to add multiple separators in the sep= argument
> in read.table? I would like to be able to create different columns
> if I see a white space OR a "/".
> 
> Thanks in advance,
> JJ
As well as Brian Ripley's suggestion for how to do it withnin R,
if you have access to the 'awk' program (as on all Unix/Linux
systems and, in principle, installable in Windows) then you can
pre-process the file outside of R on the following lines.
First, here is a test file "temp.txt":

R1C1 R1C2;R1C3
R2C1,R2C2 R2C3
R3C1,R3C2;R3C3

where each line has 3 fields, separated by any of " " or ","
or ";"
and it is desired to obtain a purely comma-separated version of it.

awk '
  BEGIN{FS="[ ]|[;]|[,]";OFS=","};{$1=$1};{print $0}
' < temp.txt  > temp2.txt

produces a file temp2.txt with contents

R1C1,R1C2,R1C3
R2C1,R2C2,R2C3
R3C1,R3C2,R3C3

The logic is that the intialisation
  BEGIN{FS="[ ]|[;]|[,]"} ; OFS=","}
sets up the Field Separator variable FS as a regular
expression which matches any one of " " ";" ","
and the
Output Field Separator OFS to be ",".

$0 denotes the entire input line, and the "$1=$1" causes
the first field to be re-computed (to be equal to itself)
so that the whole input line $0 is re-computed at which
point the OFS is then set to "," in $0.

Hence an 'awk' program to handle the case you describe
could be

awk '
  BEGIN{FS="[ ]|[/]";OFS=" "};{$1=$1};{print $0}
' < myrawfile  > myfinalfile

It gets slightly more interesting if your "white space"
separating two fields might be any number of consecutive
spaces or a TAB, say.

In that case something like

awk '
  BEGIN{FS="[ ][ ]*|[;]|[,]|[\t]";OFS=","};{$1=$1};{print
$0}
' < myrawfile  > myfinalfile

might be needed. Here "[ ][ ]*" means "one space followed
by zero or more spaces", and "\t" is the notation for TAB.

If I change the test file above to

R1C1   R1C2;R1C3
R2C1,R2C2       R2C3
R3C1,R3C2;R3C3

where the long blank in the first line is 3 consecutive " ",
and the long blank in the second line is a single TAB,
then the second 'awk' program above generates exactly the
same output as before.

Just a thought! I'm always tempted to suggest that people
use 'awk' in conjunction with R, not only to deal with the
kind of relatively simple substitutions you describe, but
also for exploring and cleaning up the sort of mess that
people can send you after exporting a CSV file from an
Excel spreadsheet, etc. (It would go on for too long, to
give examples of this sort of thing.)

With best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 19-Apr-08                                       Time: 08:52:56
------------------------------ XFMail ------------------------------

Seemingly Similar Threads

gcc and g77 combos

R help - Apr 2008 - multiple separators in sep argument for read.table?

[R] multiple separators in sep argument for read.table?

[R] multiple separators in sep argument for read.table?

[R] multiple separators in sep argument for read.table?

Seemingly Similar Threads