Hi Peter,
I'm not going to look at your large file on what for me is Friday evening,
but
the usual cause of that kind of problem is a single or double quote in the text.
One way to diagnose the problem is to look at the rows in the text file itself
right around 25952 - there's always something there causing the problem.
I'd also look in R at the last row that was imported. Often you can
see the problem
there as well.
Sarah
On Fri, Jul 29, 2011 at 8:54 PM, Peter Langfelder
<peter.langfelder at gmail.com> wrote:> Hi all,
>
> I encountered a problem when trying to read in an Illumina chip
> annotation file. The offending file is large, so I zipped it up and
> posted it at
>
>
http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/tmp/ProbeInfo_Expression.txt.bz2
>
> Executing this:
>
> annot = read.table(bzfile("ProbeInfo_Expression.txt.bz2"),
> ? ? ? ? ? ? ? ?comment.char="", ?sep = "\t", fill =
TRUE, header = TRUE);
>
> leads to
>
>> dim(annot)
> [1] 25952 ? ?28
>
> i.e. 25952 rows were read, but the file is some 48000 rows long.
>
> The file contains long text entries (up to several thousand
> characters) which appear to be the problem since stripping out those
> columns (outside of R) and re-reading gives he full 48k+ rows.
>
> My question is why is read.table stopping the read (without any
> warning or error)? Am I missing something in the documentation (read
> it but didn't find anything). Any arguments I'm not setting right?
I
> tried to google the problem but came up empty-handed.
>
> Session info:
>
>> sessionInfo()
> R version 2.11.1 Patched (2010-06-06 r52218)
> i686-pc-linux-gnu
>
> locale:
> ?[1] LC_CTYPE=en_US.utf8 ? ? ? LC_NUMERIC=C
> ?[3] LC_TIME=en_US.utf8 ? ? ? ?LC_COLLATE=en_US.utf8
> ?[5] LC_MONETARY=C ? ? ? ? ? ? LC_MESSAGES=en_US.utf8
> ?[7] LC_PAPER=en_US.utf8 ? ? ? LC_NAME=C
> ?[9] LC_ADDRESS=C ? ? ? ? ? ? ?LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base
>
>
> Thanks,
>
> Peter
>
> ____
--
Sarah Goslee
http://www.functionaldiversity.org