thr3ads.net - R help - [R] read.table [Feb 2005]

If this information is useful, please help other people find it:
Share via:

Sean Davis

2005-Feb-25 20:11 UTC

[R] read.table

I have a commonly recurring problem and wondered if folks would share 
tips.  I routinely get tab-delimited text files that I need to read in. 
  In very many cases, I get:

 > a <-
read.table('junk.txt.txt',header=T,skip=10,sep="\t")
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = 
dec,  :
	line 67 did not have 88 elements

I am typically able to go through the file and find a single quote or 
something like that causing the problem, but with a recent set of 
files, I haven't been able to find such an issue.  What can I do to get 
around this problem?  I can use perl, also....

Thanks,
Sean

Berton Gunter

2005-Feb-25 20:23 UTC

head link

[R] read.table

?readLines

I'm sure Perl will do nicely, but you can also use readLines and grep() or
regexpr() the result in R as you would in Perl to find where the problem
lies. ?nchar can also help to find a non-printing character that may be
messing you up. It's no fun, I know. Excel files can be a particular pain,
especially in their handling of missings.

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box
 
 
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Sean Davis
> Sent: Friday, February 25, 2005 12:12 PM
> To: r-help
> Subject: [R] read.table
> 
> I have a commonly recurring problem and wondered if folks would share 
> tips.  I routinely get tab-delimited text files that I need 
> to read in. 
>   In very many cases, I get:
> 
>  > a <-
read.table('junk.txt.txt',header=T,skip=10,sep="\t")
> Error in scan(file = file, what = what, sep = sep, quote = 
> quote, dec = 
> dec,  :
> 	line 67 did not have 88 elements
> 
> I am typically able to go through the file and find a single quote or 
> something like that causing the problem, but with a recent set of 
> files, I haven't been able to find such an issue.  What can I 
> do to get 
> around this problem?  I can use perl, also....
> 
> Thanks,
> Sean
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>

(Ted Harding)

2005-Feb-25 20:54 UTC

head link

[R] read.table

On 25-Feb-05 Sean Davis wrote:> I have a commonly recurring problem and wondered if folks
> would share tips.  I routinely get tab-delimited text files
> that I need to read in.
>   In very many cases, I get:
> 
>  > a <-
read.table('junk.txt.txt',header=T,skip=10,sep="\t")
> Error in scan(file = file, what = what, sep = sep, quote = quote,
> dec = dec,  :
>       line 67 did not have 88 elements
> 
> I am typically able to go through the file and find a single
> quote or something like that causing the problem, but with a
> recent set of files, I haven't been able to find such an issue.
> What can I do to get around this problem?  I can use perl, also....
Hi Sean,

This is only a shot in the dark, but your description has reminded
me of similar messes in files which have been exported from Excel.

What I have often done in such cases, to check (e.g.) the numbers
of fields in records (using 'awk' on Linux) is on the following
lines:

  cat filename | awk 'BEGIN{FS="\t"} {print NF}' | unique

In that case, if there are varying numbers of fields then
two or more different numbers will be printed instead of
the single value which it should be.

If you know how many fields to expect (e.g. 88), then you can
find the line numbers of offending records by something like

  cat filename | awk 'BEGIN{FS="\t"} {if(NF!=88){print NR}}'

In data files with a lot of records per line, doing it in
this kind of way is vastly superior to trying to spot the
problem by eye -- it's extemely difficult to count 88
tab-separated fields on screen!

Hoping this helps! If not, supply further details and we'll
see what we can think up.

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 25-Feb-05                                       Time: 20:54:43
------------------------------ XFMail ------------------------------

Tilo Blenk

2005-Feb-26 11:41 UTC

head link

[R] read.table

Maybe argument 'fill' of read.table is the solution.

The default value is FALSE in read.table and, therefore, any line not 
having the same number of fields as the first line (not skipped) will 
make problems. If set to TRUE, as in read.delim and read.csv, lines 
with less number of fields get blank fields added at the end.

If exporting tab delimited text files from Excel lines with empty 
fields at the end in the Excel file often have less fields than the 
header line in the text file. Reading them with read.delim fixes that.

If the problem is more complicated you probably need to find the lines 
with count.fields and correct them manually.

You can find them (actually the line number) with something like

which(count.fields('data.txt') != count.fields('data.txt')[1])

assuming that the first line has the correct number of fields.

Tilo

John Maindonald

2005-Feb-26 20:45 UTC

head link

[R] read.table

In addition to other suggestions made, note also count.fields().

 > cat("10 9 17  # First of 7 lines", "11 13 1 6",
"9 14 16",
+     "12 15 14", "8 15 15", "9 13 12", "7 14
18",
+     file="oneBadRow.txt", sep="\n")
 > nfields <- count.fields("oneBadRow.txt")
 > nfields
[1] 3 4 3 3 3 3 3
 > table(nfields)     ## Use with many records
nfields
3 4
6 1
 > tab <- table(nfields)
 > (1:length(nfields))[nfields == 4]
[1] 2
 > readLines("oneBadRow.txt", n=-1)[2]
[1] "11 13 1 6"

Note the various option settings for count.fields()

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Bioinformation Science, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 26 Feb 2005, at 10:03 PM, r-help-request at stat.math.ethz.ch wrote:
> From: Sean Davis <sdavis2 at mail.nih.gov>
> Date: 26 February 2005 7:11:48 AM
> To: r-help <r-help at stat.math.ethz.ch>
> Subject: [R] read.table
>
>
> I have a commonly recurring problem and wondered if folks would share 
> tips.  I routinely get tab-delimited text files that I need to read 
> in.  In very many cases, I get:
>
> > a <-
read.table('junk.txt.txt',header=T,skip=10,sep="\t")
> Error in scan(file = file, what = what, sep = sep, quote = quote, dec 
> = dec,  :
> 	line 67 did not have 88 elements
>
> I am typically able to go through the file and find a single quote or 
> something like that causing the problem, but with a recent set of 
> files, I haven't been able to find such an issue.  What can I do to 
> get around this problem?  I can use perl, also....
>
> Thanks,
> Sean

Possibly Parallel Threads

Search for more reasonably related threads

R help - Feb 2005 - read.table

[R] read.table

[R] read.table

[R] read.table

[R] read.table

[R] read.table

Possibly Parallel Threads