thr3ads.net - R help - [R] Reading CSV file with unequal record length [Jul 2008]

If this information is useful, please help other people find it:
Share via:

Viswanathan Shankar

2008-Jul-02 18:55 UTC

[R] Reading CSV file with unequal record length

Hello ,
I am having some difficulty reading a CSV file of unequal record length 
in R . The data has 26 columns and do not have header and is generated 
from a R syntax  -
write.table(schat,"schat.csv", sep=",",  col.names=FALSE,
append = TRUE)

1.0,1.0,0.0,0.1,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.2,1.5,1.9,2.7,,,,
1.0,2.0,0.0,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.5,0.5,0.6,0.7,0.8,0.9,1.1,1.2,1.4,1.6,1.9,2.2,2.7,,,
1.0,3.0,0.0,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,1.0,1.2,1.4,1.7,2.1,3.1,5.0,,,,,
1.0,4.0,0.0,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.5,0.6,0.7,0.7,0.9,1.0,1.2,1.4,1.7,2.2,3.0,,,,,
1.0,5.0,0.0,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.2,1.4,1.6,1.9,2.4,3.3,,,,
1.0,6.0,0.0,0.1,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.1,1.3,1.7,2.1,3.4,,,,,
1.0,7.0,0.0,0.1,0.1,0.2,0.3,0.3,0.4,0.5,0.5,0.6,0.7,0.8,0.9,1.1,1.2,1.4,1.7,2.0,2.5,3.3,5.5,,,
1.0,8.0,0.0,0.1,0.1,0.2,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.6,0.7,0.8,0.9,1.0,1.2,1.3,1.5,1.7,2.0,2.3,2.8,4.2
1.0,9.0,0.0,0.1,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.2,1.4,1.6,1.9,2.2,2.9,4.2,,
1.0,10.0,0.0,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.5,0.6,0.8,1.0,1.3,1.6,2.4,3.6,6.0,,,,,,,

when I use the following syntax to read the above written data

schat_n<-data.frame(read.table("schat.csv", sep=",",
header = FALSE,
fill=TRUE))

the data is fine until record # 7 but it gets wrapped on id 8 & 9 and 
limits the column to 23 and remaining values are made into second record 
as shown below with 12 records instead 10

1.0,1.0,1.0,0.0,0.1,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.2,1.5,1.9,2.7,NA
2.0,1.0,2.0,0.0,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.5,0.5,0.6,0.7,0.8,0.9,1.1,1.2,1.4,1.6,1.9,2.2,2.7
3.0,1.0,3.0,0.0,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,1.0,1.2,1.4,1.7,2.1,3.1,5.0,NA,NA
4.0,1.0,4.0,0.0,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.5,0.6,0.7,0.7,0.9,1.0,1.2,1.4,1.7,2.2,3.0,NA,NA
5.0,1.0,5.0,0.0,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.2,1.4,1.6,1.9,2.4,3.3,NA
6.0,1.0,6.0,0.0,0.1,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.1,1.3,1.7,2.1,3.4,NA,NA
7.0,1.0,7.0,0.0,0.1,0.1,0.2,0.3,0.3,0.4,0.5,0.5,0.6,0.7,0.8,0.9,1.1,1.2,1.4,1.7,2.0,2.5,3.3,5.5
8.0,1.0,8.0,0.0,0.1,0.1,0.2,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.6,0.7,0.8,0.9,1.0,1.2,1.3,1.5,1.7,2.0
9.0,2.3,2.8,4.2,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
10.0,1.0,9.0,0.0,0.1,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.2,1.4,1.6,1.9,2.2,2.9
11.0,4.2,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
12.0,1.0,10.0,0.0,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.5,0.6,0.8,1.0,1.3,1.6,2.4,3.6,6.0,NA,NA,NA,NA

I would like the dataset to be read as is with 10 records and 26 
columns,  any inputs to get this fixed is greatly appreciable.

Thank you in advance.

Shankar




-- 

###############################################
University of North Carolina-Chapel Hill
Department of Biostatistics
3101 McGavran-Greenberg, CB#7420
Chapel Hill, North Carolina 27599-7420
Phone: 919-843-1532
###############################################

Marc Schwartz

2008-Jul-02 21:07 UTC

head link

[R] Reading CSV file with unequal record length

on 07/02/2008 01:55 PM Viswanathan Shankar wrote:> Hello ,
> I am having some difficulty reading a CSV file of unequal record length 
> in R . The data has 26 columns and do not have header and is generated 
> from a R syntax  -
> write.table(schat,"schat.csv", sep=",", 
col.names=FALSE, append = TRUE)
> 
>
1.0,1.0,0.0,0.1,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.2,1.5,1.9,2.7,,,,
> 
>
1.0,2.0,0.0,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.5,0.5,0.6,0.7,0.8,0.9,1.1,1.2,1.4,1.6,1.9,2.2,2.7,,,
> 
>
1.0,3.0,0.0,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,1.0,1.2,1.4,1.7,2.1,3.1,5.0,,,,,
> 
>
1.0,4.0,0.0,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.5,0.6,0.7,0.7,0.9,1.0,1.2,1.4,1.7,2.2,3.0,,,,,
> 
>
1.0,5.0,0.0,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.2,1.4,1.6,1.9,2.4,3.3,,,,
> 
>
1.0,6.0,0.0,0.1,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.1,1.3,1.7,2.1,3.4,,,,,
> 
>
1.0,7.0,0.0,0.1,0.1,0.2,0.3,0.3,0.4,0.5,0.5,0.6,0.7,0.8,0.9,1.1,1.2,1.4,1.7,2.0,2.5,3.3,5.5,,,
> 
>
1.0,8.0,0.0,0.1,0.1,0.2,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.6,0.7,0.8,0.9,1.0,1.2,1.3,1.5,1.7,2.0,2.3,2.8,4.2
> 
>
1.0,9.0,0.0,0.1,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.2,1.4,1.6,1.9,2.2,2.9,4.2,,
> 
>
1.0,10.0,0.0,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.5,0.6,0.8,1.0,1.3,1.6,2.4,3.6,6.0,,,,,,,
> 
> 
> when I use the following syntax to read the above written data
> 
> schat_n<-data.frame(read.table("schat.csv", sep=",",
header = FALSE,
> fill=TRUE))
> 
> the data is fine until record # 7 but it gets wrapped on id 8 & 9 and 
> limits the column to 23 and remaining values are made into second record 
> as shown below with 12 records instead 10
> 
>
1.0,1.0,1.0,0.0,0.1,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.2,1.5,1.9,2.7,NA
> 
>
2.0,1.0,2.0,0.0,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.5,0.5,0.6,0.7,0.8,0.9,1.1,1.2,1.4,1.6,1.9,2.2,2.7
> 
>
3.0,1.0,3.0,0.0,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,1.0,1.2,1.4,1.7,2.1,3.1,5.0,NA,NA
> 
>
4.0,1.0,4.0,0.0,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.5,0.6,0.7,0.7,0.9,1.0,1.2,1.4,1.7,2.2,3.0,NA,NA
> 
>
5.0,1.0,5.0,0.0,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.2,1.4,1.6,1.9,2.4,3.3,NA
> 
>
6.0,1.0,6.0,0.0,0.1,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.1,1.3,1.7,2.1,3.4,NA,NA
> 
>
7.0,1.0,7.0,0.0,0.1,0.1,0.2,0.3,0.3,0.4,0.5,0.5,0.6,0.7,0.8,0.9,1.1,1.2,1.4,1.7,2.0,2.5,3.3,5.5
> 
>
8.0,1.0,8.0,0.0,0.1,0.1,0.2,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.6,0.7,0.8,0.9,1.0,1.2,1.3,1.5,1.7,2.0
> 
> 9.0,2.3,2.8,4.2,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
>
10.0,1.0,9.0,0.0,0.1,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.2,1.4,1.6,1.9,2.2,2.9
> 
> 11.0,4.2,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
>
12.0,1.0,10.0,0.0,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.5,0.6,0.8,1.0,1.3,1.6,2.4,3.6,6.0,NA,NA,NA,NA
> 
> 
> I would like the dataset to be read as is with 10 records and 26 
> columns,  any inputs to get this fixed is greatly appreciable.
> 
> Thank you in advance.
> 
> Shankar
At least based upon the data that you posted above, I have no problem 
reading it:

DF <- read.table("clipboard", sep = ",")

 > DF
    V1 V2 V3  V4  V5  V6  V7  V8  V9 V10 V11 V12 V13 V14 V15 V16 V17 V18
1   1  1  0 0.1 0.1 0.1 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.6 0.7 0.8 0.9 1.0
2   1  2  0 0.1 0.1 0.2 0.2 0.3 0.3 0.4 0.5 0.5 0.6 0.7 0.8 0.9 1.1 1.2
3   1  3  0 0.1 0.1 0.2 0.2 0.3 0.4 0.4 0.5 0.6 0.7 0.8 1.0 1.2 1.4 1.7
4   1  4  0 0.1 0.1 0.2 0.2 0.3 0.3 0.4 0.5 0.6 0.7 0.7 0.9 1.0 1.2 1.4
5   1  5  0 0.1 0.1 0.2 0.2 0.3 0.4 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.2 1.4
6   1  6  0 0.1 0.1 0.1 0.2 0.2 0.3 0.4 0.4 0.5 0.6 0.7 0.8 0.9 1.1 1.3
7   1  7  0 0.1 0.1 0.2 0.3 0.3 0.4 0.5 0.5 0.6 0.7 0.8 0.9 1.1 1.2 1.4
8   1  8  0 0.1 0.1 0.2 0.2 0.2 0.3 0.4 0.4 0.5 0.6 0.6 0.7 0.8 0.9 1.0
9   1  9  0 0.1 0.1 0.1 0.2 0.2 0.3 0.4 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.2
10  1 10  0 0.1 0.1 0.2 0.2 0.3 0.3 0.4 0.5 0.6 0.8 1.0 1.3 1.6 2.4 3.6
    V19 V20 V21 V22 V23 V24 V25 V26
1  1.2 1.5 1.9 2.7  NA  NA  NA  NA
2  1.4 1.6 1.9 2.2 2.7  NA  NA  NA
3  2.1 3.1 5.0  NA  NA  NA  NA  NA
4  1.7 2.2 3.0  NA  NA  NA  NA  NA
5  1.6 1.9 2.4 3.3  NA  NA  NA  NA
6  1.7 2.1 3.4  NA  NA  NA  NA  NA
7  1.7 2.0 2.5 3.3 5.5  NA  NA  NA
8  1.2 1.3 1.5 1.7 2.0 2.3 2.8 4.2
9  1.4 1.6 1.9 2.2 2.9 4.2  NA  NA
10 6.0  NA  NA  NA  NA  NA  NA  NA


That you are using 'append = TRUE' in the write.table() call for your 
actual data, suggests that you might have an actual source data file 
with output from more than one object with differing structures, 
resulting in mixed input formats and that may be a problem.

If the CSV file should only contain data from one R object, don't use 
'append = TRUE' or, be absolutely sure that the multiple objects have 
identical structures.

HTH,

Marc Schwartz

Peter Dalgaard

2008-Jul-02 21:20 UTC

head link

[R] Reading CSV file with unequal record length

Viswanathan Shankar wrote:> Hello ,
> I am having some difficulty reading a CSV file of unequal record 
> length in R . The data has 26 columns and do not have header and is 
> generated from a R syntax  -
> write.table(schat,"schat.csv", sep=",", 
col.names=FALSE, append = TRUE)
>
>
1.0,1.0,0.0,0.1,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.2,1.5,1.9,2.7,,,,
>
>
1.0,2.0,0.0,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.5,0.5,0.6,0.7,0.8,0.9,1.1,1.2,1.4,1.6,1.9,2.2,2.7,,,
>
>
1.0,3.0,0.0,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,1.0,1.2,1.4,1.7,2.1,3.1,5.0,,,,,
>
>
1.0,4.0,0.0,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.5,0.6,0.7,0.7,0.9,1.0,1.2,1.4,1.7,2.2,3.0,,,,,
>
>
1.0,5.0,0.0,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.2,1.4,1.6,1.9,2.4,3.3,,,,
>
>
1.0,6.0,0.0,0.1,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.1,1.3,1.7,2.1,3.4,,,,,
>
>
1.0,7.0,0.0,0.1,0.1,0.2,0.3,0.3,0.4,0.5,0.5,0.6,0.7,0.8,0.9,1.1,1.2,1.4,1.7,2.0,2.5,3.3,5.5,,,
>
>
1.0,8.0,0.0,0.1,0.1,0.2,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.6,0.7,0.8,0.9,1.0,1.2,1.3,1.5,1.7,2.0,2.3,2.8,4.2
>
>
1.0,9.0,0.0,0.1,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.2,1.4,1.6,1.9,2.2,2.9,4.2,,
>
>
1.0,10.0,0.0,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.5,0.6,0.8,1.0,1.3,1.6,2.4,3.6,6.0,,,,,,,
>
>
> when I use the following syntax to read the above written data
>
> schat_n<-data.frame(read.table("schat.csv", sep=",",
header = FALSE,
> fill=TRUE))
>
> the data is fine until record # 7 but it gets wrapped on id 8 & 9 and 
> limits the column to 23 and remaining values are made into second 
> record as shown below with 12 records instead 10
>
>
1.0,1.0,1.0,0.0,0.1,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.2,1.5,1.9,2.7,NA
>
>
2.0,1.0,2.0,0.0,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.5,0.5,0.6,0.7,0.8,0.9,1.1,1.2,1.4,1.6,1.9,2.2,2.7
>
>
3.0,1.0,3.0,0.0,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,1.0,1.2,1.4,1.7,2.1,3.1,5.0,NA,NA
>
>
4.0,1.0,4.0,0.0,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.5,0.6,0.7,0.7,0.9,1.0,1.2,1.4,1.7,2.2,3.0,NA,NA
>
>
5.0,1.0,5.0,0.0,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.2,1.4,1.6,1.9,2.4,3.3,NA
>
>
6.0,1.0,6.0,0.0,0.1,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.1,1.3,1.7,2.1,3.4,NA,NA
>
>
7.0,1.0,7.0,0.0,0.1,0.1,0.2,0.3,0.3,0.4,0.5,0.5,0.6,0.7,0.8,0.9,1.1,1.2,1.4,1.7,2.0,2.5,3.3,5.5
>
>
8.0,1.0,8.0,0.0,0.1,0.1,0.2,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.6,0.7,0.8,0.9,1.0,1.2,1.3,1.5,1.7,2.0
>
> 9.0,2.3,2.8,4.2,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
>
>
10.0,1.0,9.0,0.0,0.1,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.2,1.4,1.6,1.9,2.2,2.9
>
> 11.0,4.2,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA 
>
>
12.0,1.0,10.0,0.0,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.5,0.6,0.8,1.0,1.3,1.6,2.4,3.6,6.0,NA,NA,NA,NA
>
>
> I would like the dataset to be read as is with 10 records and 26 
> columns,  any inputs to get this fixed is greatly appreciable.
>Hmmm, I can't reproduce this (old version of R?). Copying from your mail 
gives

 > 
write.table(read.table("clipboard",sep=",",fill=TRUE),sep=",",col.names=F)
"1",1,1,0,0.1,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1,1.2,1.5,1.9,2.7,NA,NA,NA,NA
"2",1,2,0,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.5,0.5,0.6,0.7,0.8,0.9,1.1,1.2,1.4,1.6,1.9,2.2,2.7,NA,NA,NA
"3",1,3,0,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,1,1.2,1.4,1.7,2.1,3.1,5,NA,NA,NA,NA,NA
"4",1,4,0,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.5,0.6,0.7,0.7,0.9,1,1.2,1.4,1.7,2.2,3,NA,NA,NA,NA,NA
"5",1,5,0,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1,1.2,1.4,1.6,1.9,2.4,3.3,NA,NA,NA,NA
"6",1,6,0,0.1,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1.1,1.3,1.7,2.1,3.4,NA,NA,NA,NA,NA
"7",1,7,0,0.1,0.1,0.2,0.3,0.3,0.4,0.5,0.5,0.6,0.7,0.8,0.9,1.1,1.2,1.4,1.7,2,2.5,3.3,5.5,NA,NA,NA
"8",1,8,0,0.1,0.1,0.2,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.6,0.7,0.8,0.9,1,1.2,1.3,1.5,1.7,2,2.3,2.8,4.2
"9",1,9,0,0.1,0.1,0.1,0.2,0.2,0.3,0.4,0.4,0.5,0.6,0.7,0.8,0.9,1,1.2,1.4,1.6,1.9,2.2,2.9,4.2,NA,NA
"10",1,10,0,0.1,0.1,0.2,0.2,0.3,0.3,0.4,0.5,0.6,0.8,1,1.3,1.6,2.4,3.6,6,NA,NA,NA,NA,NA,NA,NA

and read.csv(......, header=FALSE) also works.

In general, the first five lines are used to determine the length of a 
line, and in this case, these are all shorter than the 8th one. However 
the trailing commas _should_ give the right count.

Anyways, you might try col.names=paste("V", 1:26, sep="") in
your
read.table call.> Thank you in advance.
>
> Shankar
>
>
>
>

-- 
   O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907

Maybe Matching Threads

Search for more maybe matching threads

R help - Jul 2008 - Reading CSV file with unequal record length

[R] Reading CSV file with unequal record length

[R] Reading CSV file with unequal record length

[R] Reading CSV file with unequal record length

Maybe Matching Threads