thr3ads.net - R help - [R] importing files, columns "invade next column" [Jan 2005]

If this information is useful, please help other people find it:
Share via:

Tiago R Magalhaes

2005-Jan-19 04:25 UTC

[R] importing files, columns "invade next column"

Dear R-listers:

I want to import a reasonably big file into a table. (15797 x 257 
columns). The file is tab delimited with NA in every empty space. I 
have reproduced what I have used as my read.table instruction. I have 
read the R-dataImportExport FAQ and still couldn't solve my problem. 
(I might have missed it, of course). I'm using R.2.01 in a Mac G4, 
10.3.7.

I can import the file, but one of the columns "invades the other", 
meaning that the if there is an empty space marked as NA on the first 
column, it gets the value of the second column. I tried to import 
four different files (details below) and I think the problem is with 
the number of columns (with less columns it works)

workarounds:
a) I can separate my file into several files, import them and then 
make one file in R
b) try to learn basic commands in awk? perl?
any advice on this?

another question (much less important) I have a binnary file in Splus 
for this object. I exported the object in Splus as it says in the FAQ 
(dump.data). But data.restore doesn't exist as a function. Is it 
because I'm using a Mac?

details of what I did:
##
a) importing a shorter version of my file (58 columns); I get the 
"invading" behaviour and a column of row.names that I don't 
understand where it comes from. (UNIQID should be empty and 1006 
should be in All.FB.Id
>  AllFBImpFields <- read.table('AllFBAllFieldsNAShorter.txt',
fill=T, header=T,+                              row.names=paste('a',1:15797,
sep=''),
+                              as.is=T, nrows=15797)>  AllFBImpFields[1:2,1:5]    row.names UNIQID All.FB.Id All.FB.5 All.FB.4
a1      <NA>  10006      <NA>     <NA>     <NA>
a2      <NA>  10007      <NA>     <NA>     <NA>

##
b) Importing only 5 cols of the previous file. It works. there is no 
"invasion" and the col row.names is not inserted
>  AllFB5Cols <- read.table('AllFB5Cols.txt', fill=T, header=T,+                          row.names=paste('a',1:15797, sep=''),
+                          as.is=T, nrows=15797)>  AllFB5Cols[1:2,1:5]    UNIQID All.FB.Id Symbol       FB.gn CG.name
a1   <NA>     10006    p53 FBgn0039044 CG10873
a2   <NA>     10007  Gr94a FBgn0041225 CG31280

##
c) importing file with 4 rows, 58 columns; invasion behaviour and a 
warning that I don't get in a) although the file is the same for the 
first 4 rows
>  x4rowsAllCol <- read.table('AllFB4rowsAllCols.txt', fill=T,
header=T,+                            row.names=paste('a',1:4, sep=''),
+                            as.is=T, nrows=4)
Warning message:
incomplete final line found by readTableHeader on
`AllFB4rowsAllCols.txt'>  x4rowsAllCol[1:2,1:5]    row.names UNIQID All.FB.Id All.FB.5 All.FB.4
a1        NA  10006        NA       NA       NA
a2        NA  10007        NA       NA       NA

##
d) importing file with 4 rows and 4 cols, result is like b) but gives 
the same warning as c!)>  x4rows5cols <- read.table('AllFB4rows5cols.txt', fill=T,
header=T,+                      row.names=paste('a',1:4, sep=''),
+                      as.is=T, nrows=4)
Warning message:
incomplete final line found by readTableHeader on
`AllFB4rows5cols.txt'>  x4rows5cols[1:2,1:5]    UNIQID All.FB.Id All.FB.5 All.FB.4 All.FB.3
a1     NA     10006       NA       NA       NA
a2     NA     10007       NA       NA       NA

Marc Schwartz

2005-Jan-19 05:42 UTC

head link

[R] importing files, columns "invade next column"

On Wed, 2005-01-19 at 04:25 +0000, Tiago R Magalhaes
wrote:> Dear R-listers:
> 
> I want to import a reasonably big file into a table. (15797 x 257 
> columns). The file is tab delimited with NA in every empty space. 
Tiago,

Have you tried to use read.table() explicitly defining the field
delimiting character as a tab to see if that changes anything?

Try the following:

AllFBImpFields <- read.table('AllFBAllFieldsNAShorter.txt',
                              header = TRUE,
                              row.names=paste('a',1:15797,
sep=''),
                              as.is = TRUE,
                              sep = "\t")

I added the 'sep = "\t"' argument at the end.

Also, leave out the 'fill = TRUE', which can cause problems. You do not
need this unless your source file has a varying number of fields per
line.

Note that you do not need to specify the 'nrows' argument unless you
generally want something less than all of the rows. Using the
combination of 'skip' and 'nrows', you can read a subset of rows
from
the middle of the input file.

See if that helps. Usually when there are column alignment problems, it
is because the rows are not being consistently parsed into fields, which
is frequently the result of not having the proper delimiting character
specified.

The last thought is to be sure that a '#' is not in your data file. This
is interpreted as a comment character by default, which means that
anything after it on a row will be ignored.

HTH,

Marc Schwartz

Prof Brian Ripley

2005-Jan-19 08:25 UTC

head link

[R] importing files, columns "invade next column"

On Wed, 19 Jan 2005, Tiago R Magalhaes wrote:
> another question (much less important) I have a binnary file in Splus for 
> this object. I exported the object in Splus as it says in the FAQ 
> (dump.data).
Whose FAQ?  data.dump is not mentioned in the R FAQ.
> But data.restore doesn't exist as a function. Is it because I'm
using a
> Mac?
It is in package foreign: please consult the `R Data Import/Export 
Manual'.  There are details you need to follow, including loading package 
foreign.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Tiago R Magalhaes

2005-Jan-19 19:28 UTC

head link

[R] importing files, columns "invade next column"

Thanks very much Mark and Prof Ripley

a) using sep='\t' when using read.table() helps somewhat

there is still a problem:I cannot get all the lines:
df <- read.table('file.txt', fill=T, header=T, sep='\t')
dim(df)
  9543  195

while with the shorter file (11 cols) I get all the rows
dim(df)
  15797    11

I have looked at row 9544 where the file seems to stop reading, but I 
cannot see in any of the cols an obvious reason for this to happen. 
Any ideas why? Maybe there is one col that is stopping the reading 
process and that column is not one of the 11 that are present in the 
smaller file.

b) fill=T is necessary
without fill=T, I get an error:
"line 1892 did not have 195 elements"

c) help page for read.table
I reread the help file for read.table and I would suggest to change 
it. From what I think I am reading, the '\t' would not be needed to 
work in my file, but it actually is:from the help page:

  If 'sep = ""' (the default for 'read.table') the
separator is "white
space", that is one or more spaces, tabs or newlines.

d) I incorrectly mentioned the FAQ in relation with data.restore. 
Where I actually saw data.restore mentioned was in the `R Data 
Import/Export Manual', which I read (even more than once...) failing 
to read the first paragraph of section where it's stated that the 
foreign package is used.

it works! (with source):
in Splus 6.1, windows 2000:
dump('file')
in R2.01, Mac 10.3.7:
source('file')

I get a list, where the first element is the data.frame I want
the column names have value added to them

>On Wed, 2005-01-19 at 04:25 +0000, Tiago R Magalhaes wrote:
>  > Dear R-listers:
>>
>>  I want to import a reasonably big file into a table. (15797 x 257
>>  columns). The file is tab delimited with NA in every empty space.
>
>Tiago,
>
>Have you tried to use read.table() explicitly defining the field
>delimiting character as a tab to see if that changes anything?
>
>Try the following:
>
>AllFBImpFields <- read.table('AllFBAllFieldsNAShorter.txt',
>                               header = TRUE,
>                               row.names=paste('a',1:15797,
sep=''),
>                               as.is = TRUE,
>                               sep = "\t")
>
>I added the 'sep = "\t"' argument at the end.
>
>Also, leave out the 'fill = TRUE', which can cause problems. You do
not
>need this unless your source file has a varying number of fields per
>line.
>
>Note that you do not need to specify the 'nrows' argument unless you
>generally want something less than all of the rows. Using the
>combination of 'skip' and 'nrows', you can read a subset of
rows from
>the middle of the input file.
>
>See if that helps. Usually when there are column alignment problems, it
>is because the rows are not being consistently parsed into fields, which
>is frequently the result of not having the proper delimiting character
>specified.
>
>The last thought is to be sure that a '#' is not in your data file.
This
>is interpreted as a comment character by default, which means that
>anything after it on a row will be ignored.
>
>HTH,
>
>Marc Schwartz

Seemingly Similar Threads

Search for more maybe matching threads

R help - Jan 2005 - importing files, columns "invade next column"

[R] importing files, columns "invade next column"

[R] importing files, columns "invade next column"

[R] importing files, columns "invade next column"

[R] importing files, columns "invade next column"

Seemingly Similar Threads