Meinhard Ploner
2002-Feb-22  14:00 UTC
Summary: [R] read.table on Mac OS X, CARBON vs. DARWIN
Thanks a lot, James!! The problem is fixed. On the version 1.4.0 Mac/darwin (the latest available version for this system) the function read.table (which is called from read.delim etc., too) has the bug you explained. Inserting the row nlines <- nlines+1 after lines <- c(lines, line) removes this bug. M. On Friday, February 22, 2002, at 02:33 PM, james.holtman at convergys.com wrote:> > If you can not the the latest 1.4.1, here is a patch (adds one line to > read.table) that will fix it on your current system. > >> The 'read.table' function appears to be up to 10X slower in R 1.4.0 >> than > R >> 1.3.1 for some of the data sets I read in. I was comparing the source > code >> for the 2 versions and see that it was rewritten in R 1.4.0. >> >> I think I found out what part of the problem might be. I was comparing >> R1.3.1 and R1.4.0 code and it appears that a statement is missing in >> some >> of the code for R 1.4. This is the section of code at the beginning of >> read.table. The loop starting with 'while (nlines < 5)' will read in >> the >> entire file, because there is no increment of 'nlines' in the loop. I >> traced through the code and this is what was happening. It then >> does a >> 'pushBack' of the entire file. In tracing through the code, this is > where >> is appears to be taking the time. With the change noted below, the >> speed >> was similar to R 1.3.1 and the results were the same. >> >> Here is the current code with what I think is the additional statement >> needed: >> >> =================part of read.table=======>> >> nlines <- 0 >> lines <- NULL >> while (nlines < 5) { >> line <- readLines(file, 1, ok = TRUE) >> if (length(line) == 0) >> break >> if (blank.lines.skip && length(grep("^[ \\t]*$", line))) >> next >> if (length(comment.char) && nchar(comment.char)) { >> pattern <- paste("^[ \\t]*", substring(comment.char, >> 1, 1), sep = "") >> if (length(grep(pattern, line))) >> next >> } >> lines <- c(lines, line) >> # >> # additional line required >> # >> nlines <- nlines+1 >> } > >> -- > > > > > Meinhard Ploner <meinhardploner at gmx.net> on 02/22/2002 03:17:34 > > To: james.holtman at convergys.com > cc: > Subject: Re: [R] read.table on Mac OS X, CARBON vs. DARWIN > > > Yes. Thanks a lot. > I had the 1.4.0 because on Fink the latest version (1.4.1) is not > available. However, I will download it from the CRAN. > Meinhard > > > On Thursday, February 21, 2002, at 10:29 PM, > james.holtman at convergys.com wrote: > >> read.table did have a bug in it in 1.4.0. It was fixed in 1.4.1. Is >> that >> what you are running with? > > > > > > -- > > NOTICE: The information contained in this electronic mail transmission > is > intended by Convergys Corporation for the use of the named individual or > entity to which it is directed and may contain information that is > privileged or otherwise confidential. If you have received this > electronic > mail transmission in error, please delete it from your system without > copying or forwarding it, and notify the sender of the error by reply > email > or by telephone (collect), so that the sender's address records can be > corrected. > >-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 3428 bytes Desc: not available Url : https://stat.ethz.ch/pipermail/r-help/attachments/20020222/c18a7c14/attachment.bin
David R. Bickel
2002-Feb-23  00:02 UTC
Summary: [R] read.table on Mac OS X, CARBON vs. DARWIN
Adding that line didn't work for me. I get the same problem as before 
(version 1.4.0):
'temp' is a two-line text file with three tab-delimited columns.
UNDER DARWIN:
 > read.table('temp')
               V1   V2   V3
1 AFFX-BioB-5_at -214 -139
2 AFFX-BioB-M_at  -49  -11
 > read.table('temp',as.is=TRUE)
stack imbalance in internal type.convert, 26 then 25stack imbalance in 
Internal, 25 then 24
stack imbalance in if, 19 then 18
stack imbalance in <-, 17 then 16
stack imbalance in {, 15 then 14
stack imbalance in for, 8 then 7
stack imbalance in {, 6 then 5
               V1   V2   V3
1 AFFX-BioB-5_at -214 -139
2 AFFX-BioB-M_at  -49  -11
Error: unprotect(): stack imbalance
UNDER CARBON:
 > read.table('temp')
               V1   V2   V3
1 AFFX-BioB-5_at -214 -139
2 AFFX-BioB-M_at  -49  -11
 > read.table('temp',as.is=TRUE)
               V1   V2   V3
1 AFFX-BioB-5_at -214 -139
2 AFFX-BioB-M_at  -49  -11
On Friday, February 22, 2002, at 09:00 X, Meinhard Ploner wrote:
> Thanks a lot, James!!
> The problem is fixed. On the version 1.4.0 Mac/darwin (the latest 
> available version for this system) the function read.table (which is 
> called from read.delim etc., too) has the bug you explained.
>
> Inserting the row
> 	nlines <- nlines+1
> after
> 	 lines <- c(lines, line)
> removes this bug.
> M.
>
>
> On Friday, February 22, 2002, at 02:33  PM, james.holtman at convergys.com 
> wrote:
>
>>
>> If you can not the the latest 1.4.1, here is a patch (adds one line to
>> read.table) that will fix it on your current system.
>>
>>> The 'read.table' function appears to be up to 10X slower in
R 1.4.0
>>> than
>> R
>>> 1.3.1 for some of the data sets I read in.  I was comparing the
source
>> code
>>> for the 2 versions and see that it was rewritten in R 1.4.0.
>>>
>>> I think I found out what part of the problem might be.  I was 
>>> comparing
>>> R1.3.1 and R1.4.0 code and it appears that a statement is missing
in
>>> some
>>> of the code for R 1.4.  This is the section of code at the
beginning
>>> of
>>> read.table.  The loop starting with 'while (nlines < 5)'
will read in
>>> the
>>> entire file, because there is no increment of 'nlines' in
the loop.  I
>>> traced through the code  and this is what was happening.  It then 
>>> does a
>>> 'pushBack' of the entire file.  In tracing through the
code, this is
>> where
>>> is appears to be taking the time.  With the change noted below, the
>>> speed
>>> was similar to R 1.3.1 and the results were the same.
>>>
>>> Here is the current code with what I think is the additional
statement
>>> needed:
>>>
>>> =================part of read.table=======>>>
>>>     nlines <- 0
>>>     lines <- NULL
>>>     while (nlines < 5) {
>>>         line <- readLines(file, 1, ok = TRUE)
>>>         if (length(line) == 0)
>>>             break
>>>         if (blank.lines.skip && length(grep("^[
\\t]*$", line)))
>>>             next
>>>         if (length(comment.char) && nchar(comment.char)) {
>>>             pattern <- paste("^[ \\t]*",
substring(comment.char,
>>>                 1, 1), sep = "")
>>>             if (length(grep(pattern, line)))
>>>                 next
>>>         }
>>>         lines <- c(lines, line)
>>>        #
>>>        #  additional line required
>>>        #
>>>        nlines <- nlines+1
>>>     }
>>
>>> --
>>
>>
>>
>>
>> Meinhard Ploner <meinhardploner at gmx.net> on 02/22/2002
03:17:34
>>
>> To:   james.holtman at convergys.com
>> cc:
>> Subject:  Re: [R] read.table on Mac OS X, CARBON vs. DARWIN
>>
>>
>> Yes. Thanks a lot.
>> I had the 1.4.0 because on Fink the latest version (1.4.1) is not
>> available. However, I will download it from the CRAN.
>> Meinhard
>>
>>
>> On Thursday, February 21, 2002, at 10:29  PM,
>> james.holtman at convergys.com wrote:
>>
>>> read.table did have a bug in it in 1.4.0.  It was fixed in 1.4.1. 
Is
>>> that
>>> what you are running with?
>>
>>
>>
>>
>>
>> --
>>
>> NOTICE:  The information contained in this electronic mail 
>> transmission is
>> intended by Convergys Corporation for the use of the named individual 
>> or
>> entity to which it is directed and may contain information that is
>> privileged or otherwise confidential.  If you have received this 
>> electronic
>> mail transmission in error, please delete it from your system without
>> copying or forwarding it, and notify the sender of the error by reply 
>> email
>> or by telephone (collect), so that the sender's address records can
be
>> corrected.
>>
>>
>>
http://www.mcg.edu/research/biostat/bickel.html
David R. Bickel, PhD
Assistant Professor
Medical College of Georgia
Office of Biostatistics and Bioinformatics
1120 Fifteenth St., AE-3037
Augusta, GA 30912-4900
Tel.: 706-721-4697; Fax: 706-721-6294
E-mail: dbickel at mail.mcg.edu or bickel at prueba.info
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 4761 bytes
Desc: not available
Url :
https://stat.ethz.ch/pipermail/r-help/attachments/20020222/19523d0e/attachment.bin