james.holtman@convergys.com
2003-Apr-30 18:45 UTC
[R] textConnection taking a long time to open a big string
I was using 'textConnection' to read in a file with about 11,000 lines so I could detect lines with incomplete data and delete them and then read them in with 'scan'. I am using 1.7.0 on Windows. Here is the output from the script and it was using 51 seconds just to do the textConnection. Is there a limit on how large a text object can be to be used with 'textConnection'? ######## script output ################> x.1 <- scan("/mpstat.ssgdbsv4.030430.txt",what='',sep='\n')Read 11299 items> str(x.1)chr [1:11299] "8.3155 32 71 4 1907 122 0 1130 105 167 216 0 3686 32 13 37 18" ...> unix.time(x.in <- textConnection(x.1)) # this takes a long time[1] 51.96 0.01 53.20 NA NA> sum(nchar(x.1)) # total number of characters in the vector[1] 944525> unix.time(x.c <- count.fields(x.in)) # this goes pretty fast[1] 0.14 0.00 0.14 NA NA> table(x.c) # detect incomplete linesx.c 3 6 17 1 1 11297> > version_ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 1 minor 7.0 year 2003 month 04 day 16 language R>-- "NOTICE: The information contained in this electronic mail tran... {{dropped}}
Thomas W Blackwell
2003-Apr-30 19:23 UTC
[R] textConnection taking a long time to open a big string
Two alternate ways to the same result: x.1 <- scan(file=, what=rep(list(0),17), fill=T, multi.line=F) incomplete.lines <- seq(length(x.1[[17]]))[ is.na(x.1[[17]] ] x.1 <- scan(file=, what='') x.2 <- strsplit(x.1, "[\\t ]") incomplete.lines <- seq(length(x.1))[ unlist(lapply(x.2, length)) < 17 ] Please read the help for these functions. HTH - tom blackwell - u michigan medical school - ann arbor - On Wed, 30 Apr 2003 james.holtman at convergys.com wrote:> I was using 'textConnection' to read in a file with about 11,000 lines so I > could detect lines with incomplete data and delete them and then read them > in with 'scan'. I am using 1.7.0 on Windows. Here is the output from the > script and it was using 51 seconds just to do the textConnection. > > Is there a limit on how large a text object can be to be used with > 'textConnection'? > > ######## script output ################ > > x.1 <- scan("/mpstat.ssgdbsv4.030430.txt",what='',sep='\n') > Read 11299 items > > str(x.1) > chr [1:11299] "8.3155 32 71 4 1907 122 0 1130 105 167 216 > 0 3686 32 13 37 18" ... > > unix.time(x.in <- textConnection(x.1)) # this takes a long time > [1] 51.96 0.01 53.20 NA NA > > sum(nchar(x.1)) # total number of characters in the vector > [1] 944525 > > unix.time(x.c <- count.fields(x.in)) # this goes pretty fast > [1] 0.14 0.00 0.14 NA NA > > table(x.c) # detect incomplete lines > x.c > 3 6 17 > 1 1 11297