Meinhard Ploner
2002-Feb-22 14:00 UTC
Summary: [R] read.table on Mac OS X, CARBON vs. DARWIN
Thanks a lot, James!! The problem is fixed. On the version 1.4.0 Mac/darwin (the latest available version for this system) the function read.table (which is called from read.delim etc., too) has the bug you explained. Inserting the row nlines <- nlines+1 after lines <- c(lines, line) removes this bug. M. On Friday, February 22, 2002, at 02:33 PM, james.holtman at convergys.com wrote:> > If you can not the the latest 1.4.1, here is a patch (adds one line to > read.table) that will fix it on your current system. > >> The 'read.table' function appears to be up to 10X slower in R 1.4.0 >> than > R >> 1.3.1 for some of the data sets I read in. I was comparing the source > code >> for the 2 versions and see that it was rewritten in R 1.4.0. >> >> I think I found out what part of the problem might be. I was comparing >> R1.3.1 and R1.4.0 code and it appears that a statement is missing in >> some >> of the code for R 1.4. This is the section of code at the beginning of >> read.table. The loop starting with 'while (nlines < 5)' will read in >> the >> entire file, because there is no increment of 'nlines' in the loop. I >> traced through the code and this is what was happening. It then >> does a >> 'pushBack' of the entire file. In tracing through the code, this is > where >> is appears to be taking the time. With the change noted below, the >> speed >> was similar to R 1.3.1 and the results were the same. >> >> Here is the current code with what I think is the additional statement >> needed: >> >> =================part of read.table=======>> >> nlines <- 0 >> lines <- NULL >> while (nlines < 5) { >> line <- readLines(file, 1, ok = TRUE) >> if (length(line) == 0) >> break >> if (blank.lines.skip && length(grep("^[ \\t]*$", line))) >> next >> if (length(comment.char) && nchar(comment.char)) { >> pattern <- paste("^[ \\t]*", substring(comment.char, >> 1, 1), sep = "") >> if (length(grep(pattern, line))) >> next >> } >> lines <- c(lines, line) >> # >> # additional line required >> # >> nlines <- nlines+1 >> } > >> -- > > > > > Meinhard Ploner <meinhardploner at gmx.net> on 02/22/2002 03:17:34 > > To: james.holtman at convergys.com > cc: > Subject: Re: [R] read.table on Mac OS X, CARBON vs. DARWIN > > > Yes. Thanks a lot. > I had the 1.4.0 because on Fink the latest version (1.4.1) is not > available. However, I will download it from the CRAN. > Meinhard > > > On Thursday, February 21, 2002, at 10:29 PM, > james.holtman at convergys.com wrote: > >> read.table did have a bug in it in 1.4.0. It was fixed in 1.4.1. Is >> that >> what you are running with? > > > > > > -- > > NOTICE: The information contained in this electronic mail transmission > is > intended by Convergys Corporation for the use of the named individual or > entity to which it is directed and may contain information that is > privileged or otherwise confidential. If you have received this > electronic > mail transmission in error, please delete it from your system without > copying or forwarding it, and notify the sender of the error by reply > email > or by telephone (collect), so that the sender's address records can be > corrected. > >-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 3428 bytes Desc: not available Url : https://stat.ethz.ch/pipermail/r-help/attachments/20020222/c18a7c14/attachment.bin
David R. Bickel
2002-Feb-23 00:02 UTC
Summary: [R] read.table on Mac OS X, CARBON vs. DARWIN
Adding that line didn't work for me. I get the same problem as before (version 1.4.0): 'temp' is a two-line text file with three tab-delimited columns. UNDER DARWIN: > read.table('temp') V1 V2 V3 1 AFFX-BioB-5_at -214 -139 2 AFFX-BioB-M_at -49 -11 > read.table('temp',as.is=TRUE) stack imbalance in internal type.convert, 26 then 25stack imbalance in Internal, 25 then 24 stack imbalance in if, 19 then 18 stack imbalance in <-, 17 then 16 stack imbalance in {, 15 then 14 stack imbalance in for, 8 then 7 stack imbalance in {, 6 then 5 V1 V2 V3 1 AFFX-BioB-5_at -214 -139 2 AFFX-BioB-M_at -49 -11 Error: unprotect(): stack imbalance UNDER CARBON: > read.table('temp') V1 V2 V3 1 AFFX-BioB-5_at -214 -139 2 AFFX-BioB-M_at -49 -11 > read.table('temp',as.is=TRUE) V1 V2 V3 1 AFFX-BioB-5_at -214 -139 2 AFFX-BioB-M_at -49 -11 On Friday, February 22, 2002, at 09:00 X, Meinhard Ploner wrote:> Thanks a lot, James!! > The problem is fixed. On the version 1.4.0 Mac/darwin (the latest > available version for this system) the function read.table (which is > called from read.delim etc., too) has the bug you explained. > > Inserting the row > nlines <- nlines+1 > after > lines <- c(lines, line) > removes this bug. > M. > > > On Friday, February 22, 2002, at 02:33 PM, james.holtman at convergys.com > wrote: > >> >> If you can not the the latest 1.4.1, here is a patch (adds one line to >> read.table) that will fix it on your current system. >> >>> The 'read.table' function appears to be up to 10X slower in R 1.4.0 >>> than >> R >>> 1.3.1 for some of the data sets I read in. I was comparing the source >> code >>> for the 2 versions and see that it was rewritten in R 1.4.0. >>> >>> I think I found out what part of the problem might be. I was >>> comparing >>> R1.3.1 and R1.4.0 code and it appears that a statement is missing in >>> some >>> of the code for R 1.4. This is the section of code at the beginning >>> of >>> read.table. The loop starting with 'while (nlines < 5)' will read in >>> the >>> entire file, because there is no increment of 'nlines' in the loop. I >>> traced through the code and this is what was happening. It then >>> does a >>> 'pushBack' of the entire file. In tracing through the code, this is >> where >>> is appears to be taking the time. With the change noted below, the >>> speed >>> was similar to R 1.3.1 and the results were the same. >>> >>> Here is the current code with what I think is the additional statement >>> needed: >>> >>> =================part of read.table=======>>> >>> nlines <- 0 >>> lines <- NULL >>> while (nlines < 5) { >>> line <- readLines(file, 1, ok = TRUE) >>> if (length(line) == 0) >>> break >>> if (blank.lines.skip && length(grep("^[ \\t]*$", line))) >>> next >>> if (length(comment.char) && nchar(comment.char)) { >>> pattern <- paste("^[ \\t]*", substring(comment.char, >>> 1, 1), sep = "") >>> if (length(grep(pattern, line))) >>> next >>> } >>> lines <- c(lines, line) >>> # >>> # additional line required >>> # >>> nlines <- nlines+1 >>> } >> >>> -- >> >> >> >> >> Meinhard Ploner <meinhardploner at gmx.net> on 02/22/2002 03:17:34 >> >> To: james.holtman at convergys.com >> cc: >> Subject: Re: [R] read.table on Mac OS X, CARBON vs. DARWIN >> >> >> Yes. Thanks a lot. >> I had the 1.4.0 because on Fink the latest version (1.4.1) is not >> available. However, I will download it from the CRAN. >> Meinhard >> >> >> On Thursday, February 21, 2002, at 10:29 PM, >> james.holtman at convergys.com wrote: >> >>> read.table did have a bug in it in 1.4.0. It was fixed in 1.4.1. Is >>> that >>> what you are running with? >> >> >> >> >> >> -- >> >> NOTICE: The information contained in this electronic mail >> transmission is >> intended by Convergys Corporation for the use of the named individual >> or >> entity to which it is directed and may contain information that is >> privileged or otherwise confidential. If you have received this >> electronic >> mail transmission in error, please delete it from your system without >> copying or forwarding it, and notify the sender of the error by reply >> email >> or by telephone (collect), so that the sender's address records can be >> corrected. >> >> >>http://www.mcg.edu/research/biostat/bickel.html David R. Bickel, PhD Assistant Professor Medical College of Georgia Office of Biostatistics and Bioinformatics 1120 Fifteenth St., AE-3037 Augusta, GA 30912-4900 Tel.: 706-721-4697; Fax: 706-721-6294 E-mail: dbickel at mail.mcg.edu or bickel at prueba.info -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 4761 bytes Desc: not available Url : https://stat.ethz.ch/pipermail/r-help/attachments/20020222/19523d0e/attachment.bin