On Mon, 10 Nov 2003, Mathieu Drapeau wrote:> Is it normal that it takes a very long time to generate a connection > object on a big character vector?Yes.> This takes a very long time to process: > lines <- readLines ("myBigFile.txt") > data <- scan(textConnection(lines), sep = "\t") > > against this that is pretty short to process: > data <- scan("myBigFile.txt", sep = "\t") > > Anyone has any clues how to efficiently do that because I need to use a > textConnection on a big vector?Why? There are better ways, even described on the help page! -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
--- On Mon 11/10, Mathieu Drapeau < mathieu.drapeau at bioneq.qc.ca > wrote:> Is it normal that it takes a very long time to generate a > connection object on a big character vector?If the reason you need a text connection is that you are locating your data via a tag like this: lines <- readLines( "input.txt" ) # lines is a vector of lines g <- grep( "start", lines ) # position of tag mydata <- read.table( textConnection(lines), skip=g[1], head=TRUE ) then you could simply read your data twice like this: lines <- readLines( "input.txt" ) # lines is a vector of lines g <- grep( "start", lines ) # position of tag mydata <- read.table( "input.txt", skip=g[1], head=TRUE )
Is it normal that it takes a very long time to generate a connection object on a big character vector? This takes a very long time to process: lines <- readLines ("myBigFile.txt") data <- scan(textConnection(lines), sep = "\t") against this that is pretty short to process: data <- scan("myBigFile.txt", sep = "\t") Anyone has any clues how to efficiently do that because I need to use a textConnection on a big vector? Thank you, Mathieu