thr3ads.net - R help - [R] read.table & readLines behaviour? [Sep 2008]

If this information is useful, please help other people find it:
Share via:

J.delasHeras at ed.ac.uk

2008-Sep-23 09:19 UTC

[R] read.table & readLines behaviour?

Hi,


I have been using 'read.table' regularly to read tab-delimited text  
files with data. No problem, until now.
Now I have a file that appeared to have read fine, and the data inside  
looks correct (structure etc), except I only had 15000+ rows out of  
the expected 24000. Using 'readLines' instead, and breaking up the  
data by tabs, gives me the expected result.
I do not understand why this is happening and I can't find anything  
obvious in the data to explain the bahaviour...
Does anybody have an explanation? something to watch out for?

If I run this I get the incomplete set:>
oldprobesets<-read.table("All_norm_calls.txt",sep="\t",header=T,stringsAsFactors=F)
> dim(oldprobesets)[1] 15733    11

but I get the right data if I use:
> probesets<-readLines("All_norm_calls.txt")
> tmp<-matrix(ncol=11,nrow=24000)
> for (i in 1:24000)
tmp[i,]<-unlist(strsplit(probesets[i+1],split="\t"))
> colnames(tmp)<-unlist(strsplit(probesets[1],split="\t"))
> probesets<-data.frame(tmp,stringsAsFactors=F)
> dim(probesets)[1] 24000    11


Here's my sessionInfo output:
> sessionInfo()R version 2.7.0 (2008-04-22)
i386-pc-mingw32

locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United  
Kingdom.1252;LC_MONETARY=English_United  
Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats     graphics  grDevices datasets  tcltk     utils     methods
[8] base

other attached packages:
[1] limma_2.14.0   svSocket_0.9-5 svIO_0.9-5     R2HTML_1.59    svMisc_0.9-5
[6] svIDE_0.9-5

loaded via a namespace (and not attached):
[1] tools_2.7.0


Thanks!

Jose

-- 
Dr. Jose I. de las Heras                      Email: J.delasHeras at ed.ac.uk
The Wellcome Trust Centre for Cell Biology    Phone: +44 (0)131 6513374
Institute for Cell & Molecular Biology        Fax:   +44 (0)131 6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Peter Dalgaard

2008-Sep-23 11:29 UTC

head link

[R] read.table & readLines behaviour?

J.delasHeras at ed.ac.uk wrote:>
> Hi,
>
>
> I have been using 'read.table' regularly to read tab-delimited text
> files with data. No problem, until now.
> Now I have a file that appeared to have read fine, and the data inside
> looks correct (structure etc), except I only had 15000+ rows out of
> the expected 24000. Using 'readLines' instead, and breaking up the
> data by tabs, gives me the expected result.
> I do not understand why this is happening and I can't find anything
> obvious in the data to explain the bahaviour...
> Does anybody have an explanation? something to watch out for?Hmm:

- completely blank lines
- filling
- quotes

My bet would be on the last one. Does read.delim work better?

Also, just in case: Check length(probesets) after the readLines
call.>
> If I run this I get the incomplete set:
>>
oldprobesets<-read.table("All_norm_calls.txt",sep="\t",header=T,stringsAsFactors=F)
>>
>> dim(oldprobesets)
> [1] 15733    11
>
> but I get the right data if I use:
>
>> probesets<-readLines("All_norm_calls.txt")
>> tmp<-matrix(ncol=11,nrow=24000)
>> for (i in 1:24000)
tmp[i,]<-unlist(strsplit(probesets[i+1],split="\t"))
>> colnames(tmp)<-unlist(strsplit(probesets[1],split="\t"))
>> probesets<-data.frame(tmp,stringsAsFactors=F)
>> dim(probesets)
> [1] 24000    11
>
>
> Here's my sessionInfo output:
>
>> sessionInfo()
> R version 2.7.0 (2008-04-22)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
> Kingdom.1252;LC_MONETARY=English_United
> Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
>
> attached base packages:
> [1] stats     graphics  grDevices datasets  tcltk     utils     methods
> [8] base
>
> other attached packages:
> [1] limma_2.14.0   svSocket_0.9-5 svIO_0.9-5     R2HTML_1.59   
> svMisc_0.9-5
> [6] svIDE_0.9-5
>
> loaded via a namespace (and not attached):
> [1] tools_2.7.0
>
>
> Thanks!
>
> Jose
>
> --Dr. Jose I. de las Heras                      Email:
> J.delasHeras at ed.ac.uk
> The Wellcome Trust Centre for Cell Biology    Phone: +44 (0)131 6513374
> Institute for Cell & Molecular Biology        Fax:   +44 (0)131 6507360
> Swann Building, Mayfield Road
> University of Edinburgh
> Edinburgh EH9 3JR
> UK
>
> --The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
   O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907

Gabor Grothendieck

2008-Sep-23 11:34 UTC

head link

[R] read.table & readLines behaviour?

Try looking at the result of count.fields to diagnose it.

On Tue, Sep 23, 2008 at 5:19 AM,  <J.delasHeras at ed.ac.uk>
wrote:>
> Hi,
>
>
> I have been using 'read.table' regularly to read tab-delimited text
files
> with data. No problem, until now.
> Now I have a file that appeared to have read fine, and the data inside
looks
> correct (structure etc), except I only had 15000+ rows out of the expected
> 24000. Using 'readLines' instead, and breaking up the data by tabs,
gives me
> the expected result.
> I do not understand why this is happening and I can't find anything
obvious
> in the data to explain the bahaviour...
> Does anybody have an explanation? something to watch out for?
>
> If I run this I get the incomplete set:
>>
>>
>>
oldprobesets<-read.table("All_norm_calls.txt",sep="\t",header=T,stringsAsFactors=F)
>> dim(oldprobesets)
>
> [1] 15733    11
>
> but I get the right data if I use:
>
>> probesets<-readLines("All_norm_calls.txt")
>> tmp<-matrix(ncol=11,nrow=24000)
>> for (i in 1:24000)
tmp[i,]<-unlist(strsplit(probesets[i+1],split="\t"))
>> colnames(tmp)<-unlist(strsplit(probesets[1],split="\t"))
>> probesets<-data.frame(tmp,stringsAsFactors=F)
>> dim(probesets)
>
> [1] 24000    11
>
>
> Here's my sessionInfo output:
>
>> sessionInfo()
>
> R version 2.7.0 (2008-04-22)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
> Kingdom.1252;LC_MONETARY=English_United
> Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
>
> attached base packages:
> [1] stats     graphics  grDevices datasets  tcltk     utils     methods
> [8] base
>
> other attached packages:
> [1] limma_2.14.0   svSocket_0.9-5 svIO_0.9-5     R2HTML_1.59   
svMisc_0.9-5
> [6] svIDE_0.9-5
>
> loaded via a namespace (and not attached):
> [1] tools_2.7.0
>
>
> Thanks!
>
> Jose
>
> --
> Dr. Jose I. de las Heras                      Email: J.delasHeras at
ed.ac.uk
> The Wellcome Trust Centre for Cell Biology    Phone: +44 (0)131 6513374
> Institute for Cell & Molecular Biology        Fax:   +44 (0)131 6507360
> Swann Building, Mayfield Road
> University of Edinburgh
> Edinburgh EH9 3JR
> UK
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Apparently Analagous Threads

Search for more seemingly similar threads

R help - Sep 2008 - read.table & readLines behaviour?

[R] read.table & readLines behaviour?

[R] read.table & readLines behaviour?

[R] read.table & readLines behaviour?

Apparently Analagous Threads