thr3ads.net - R help - [R] Read.table problems [May 2009]

If this information is useful, please help other people find it:
Share via:

Steve Murray

2009-May-18 16:24 UTC

[R] Read.table problems

Dear all,

I have a file which I've converted from NetCDF (.nc) to text (.txt) using
ncdump in Unix (as I had problems using the ncdf package to do this). The first
few rows (as copied and pasted from the Unix console) of the file appear as
follows:

 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,


As you can see, there are a lot of NA values before the actual numeric values
start further down the dataset. My problem is that I'm having trouble
reading this file into R. I think the problem lies with the sep= argument,
although I may be wrong. I tried the following command at first, as the data
appear to be comma separated:
> read.table("test86.txt", skip=43, na.strings="-",
header=FALSE, sep=",") -> test86  # skip =43 due to meta-data
information being held in the initial rowsError in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
  line 29 did not have 25 elements

I then tried sep=" ", followed by sep="" but received a
similar-type error message (although line 29 doesn't appear to be especially
different from the rest).

I subsequently tried using sep=\t and then sep=\n. These both result in the data
being read in without an error message being displayed, although the data are
formatted as follows:
> head(test86)                                                                            V1
1     _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
2     _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
3     _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
4     _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
5     _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
6     _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 

> dim(test86)[1] 179899      1


Instead of one column, I'd expect there to be 720.


I think I'm getting something wrong relating to the sep= argument (or
possibly mis-using na.strings?). If anyone has any solutions to this then
I'd be very grateful to hear them.

Many thanks for any advice,

Steve

Marc Schwartz

2009-May-18 16:58 UTC

head link

[R] Read.table problems

On May 18, 2009, at 11:24 AM, Steve Murray wrote:
>
> Dear all,
>
> I have a file which I've converted from NetCDF (.nc) to text (.txt)  
> using ncdump in Unix (as I had problems using the ncdf package to do  
> this). The first few rows (as copied and pasted from the Unix  
> console) of the file appear as follows:
>
> _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
> _, _,
>    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
> _, _,
>    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
> _, _,
>    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
> _, _,
>    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
> _, _,
>    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
> _, _,
>    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
> _, _,
>    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
> _, _,
>    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
> _, _,
>    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
> _, _,
>
>
> As you can see, there are a lot of NA values before the actual  
> numeric values start further down the dataset. My problem is that  
> I'm having trouble reading this file into R. I think the problem  
> lies with the sep= argument, although I may be wrong. I tried the  
> following command at first, as the data appear to be comma separated:
>
>> read.table("test86.txt", skip=43, na.strings="-",
header=FALSE,
>> sep=",") -> test86  # skip =43 due to meta-data
information being
>> held in the initial rows
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,  
> na.strings,  :
>  line 29 did not have 25 elements
>
> I then tried sep=" ", followed by sep="" but received a
similar-type
> error message (although line 29 doesn't appear to be especially  
> different from the rest).
>
> I subsequently tried using sep=\t and then sep=\n. These both result  
> in the data being read in without an error message being displayed,  
> although the data are formatted as follows:
>
>> head(test86)
>                                                                           
V1
> 1     _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
> _, _, _,
> 2     _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
> _, _, _,
> 3     _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
> _, _, _,
> 4     _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
> _, _, _,
> 5     _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
> _, _, _,
> 6     _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,  
> _, _, _,
>
>
>> dim(test86)
> [1] 179899      1
>
>
> Instead of one column, I'd expect there to be 720.
>
>
> I think I'm getting something wrong relating to the sep= argument  
> (or possibly mis-using na.strings?). If anyone has any solutions to  
> this then I'd be very grateful to hear them.
>
> Many thanks for any advice,
>
> Steve

Two problems,

1. Your first line above has one more column/entry than the subsequent  
lines. If that is correct, you need to use the 'fill = TRUE' argument  
so that all subsequent rows are filled to have the same number of  
columns. If the above is due to a copy/paste error, then disregard this.

2. You are using a '-' (hyphen) as your 'na.strings' character,
when
the data is using a '_' (underscore).

Additionally, I would use 'strip.white = TRUE', to aid in getting rid  
of extraneous white space around your fields/separators. That will  
also help with column separations.


Thus (on OSX) with the above data copied to the clipboard:

 > read.table(pipe("pbpaste"), na.strings = "_", sep =
",", fill =
TRUE, strip.white = TRUE)
    V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19  
V20 V21 V22 V23 V24 V25 V26
1  NA NA NA NA NA NA NA NA NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA   
NA  NA  NA  NA  NA  NA  NA
2  NA NA NA NA NA NA NA NA NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA   
NA  NA  NA  NA  NA  NA  NA
3  NA NA NA NA NA NA NA NA NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA   
NA  NA  NA  NA  NA  NA  NA
4  NA NA NA NA NA NA NA NA NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA   
NA  NA  NA  NA  NA  NA  NA
5  NA NA NA NA NA NA NA NA NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA   
NA  NA  NA  NA  NA  NA  NA
6  NA NA NA NA NA NA NA NA NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA   
NA  NA  NA  NA  NA  NA  NA
7  NA NA NA NA NA NA NA NA NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA   
NA  NA  NA  NA  NA  NA  NA
8  NA NA NA NA NA NA NA NA NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA   
NA  NA  NA  NA  NA  NA  NA
9  NA NA NA NA NA NA NA NA NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA   
NA  NA  NA  NA  NA  NA  NA
10 NA NA NA NA NA NA NA NA NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA   
NA  NA  NA  NA  NA  NA  NA



HTH,

Marc Schwartz

Maybe Matching Threads

Search for more maybe matching threads

R help - May 2009 - Read.table problems

[R] Read.table problems

[R] Read.table problems

Maybe Matching Threads