gianni lavaredo
2012-Feb-26 14:03 UTC
[R] count how many row i have in a txt file in a directory
Dear Researchers, I have a large TXT (X,Y,MyValue) file in a directory and I wish to import row by row the txt in a loop to save only the data they are inside a buffer (using inside.owin of spatstat) and delete the rest. The first step before to create a loop row-by-row is to know how many rows there are in the txt file without load in R to save memory problem. some people know the specific function? thanks in advance for all suggestions Gianni [[alternative HTML version deleted]]
Hans Ekbrand
2012-Feb-26 15:55 UTC
[R] count how many row i have in a txt file in a directory
On Sun, Feb 26, 2012 at 03:03:58PM +0100, gianni lavaredo wrote:> Dear Researchers, > > I have a large TXT (X,Y,MyValue) file in a directory and I wish to import > row by row the txt in a loop to save only the data they are inside a buffer > (using inside.owin of spatstat) and delete the rest. The first step before > to create a loop row-by-row is to know how many rows there are in the txt > file without load in R to save memory problem. > > some people know the specific function?If the number of rows are many that even only three variables per row will cause memory problems, then looping the file row-by-row will take a very long time. I would - instead of looping row-by-row - split the text file into chunks small enough for a chunk to be read into R, and operated on within R, without memory problems. I create a test file of 10.000.000 rows my.words <- replicate(10000, paste(LETTERS[sample.int(28, 10)], sep = "", collapse = "")) my.df <- data.frame(x=rnorm(10000000), y=rnorm(10000000), my.val=rep(my.words, 1000)) write.csv(my.df, file = "testmem.csv") Split the file into smaller chunks, say 1.000.000 rows. I use the split command in GNU coreutils, $ split -l 1000000 testmem.csv Loop through the cunks. for(file.name in c("xaa", "xab" ...){ chunk <- read.csv(file = file.name) [ match and add all the interesting rows to an object ] } Here's an example that for each chunk prints its third row. for(file.name in c("xaa", "xab")){ chunk <- read.csv(file = file.name) print(chunk[3,]) } With a chunk of 1.000.000 rows, R needed about 250 MB RAM to process this loop.
Rui Barradas
2012-Feb-26 17:39 UTC
[R] count how many row i have in a txt file in a directory
Hello,> The first step before to create a loop row-by-row is to know > how many rows there are in the txt file without load in R to save memory > problem. > > some people know the specific function? >I don't believe there's a specific function. If you want to know how many rows are there in a txt file, try this function. numTextFileLines <- function(filename, header=FALSE, sep=",", nrows=5000){ tc <- file(filename, open="rt") on.exit(close(tc)) if(header){ # cnames: column names (not used) cnames <- read.table(file=tc, sep=sep, nrows=1, stringsAsFactors=FALSE) # cnames <- as.character(cnames) } n <- 0 while(TRUE){ x <- tryCatch(read.table(file=tc, sep=sep, nrows=nrows), error=function(e) e) if (any(grepl("no lines available", unclass(x)))) break if(nrow(x) < nrows){ n <- n + nrow(x) break } n <- n + nrows } n } # Make a data file N <- 1e7 + 1 d <- data.frame(X=1:N, Y=sample(10, N, T), MyValue=rnorm(N)) write.table(d, file="test.txt", row.names=FALSE, sep=",") # Count it's lines, but not the header, nrows=5k at a time t1 <- system.time({ nlines <- numTextFileLines("test.txt", header=TRUE) }) cat(" Lines read:", nlines, "\n", "Last block:", nlines %% 5000, "\n") # Clean-up unlink("test.txt")> I have a large TXT (X,Y,MyValue) file in a directory and I wish to import > row by row the txt in a loop to save only the data they are inside a > buffer > (using inside.owin of spatstat) and delete the rest.Maybe you don't need to count the number of rows on the file, you could adapt the code above to process it in blocks. Something like # Start of the function code is the same if (any(grepl("no lines available", unclass(x)))) break # Process 'x', row-wise apply(x, 1, MyFunction) # if(nrow(x) < nrows){ ... etc ... Hope this helps, Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/count-how-many-row-i-have-in-a-txt-file-in-a-directory-tp4422186p4422549.html Sent from the R help mailing list archive at Nabble.com.
Hans Ekbrand
2012-Feb-26 21:10 UTC
[R] count how many row i have in a txt file in a directory
On Sun, Feb 26, 2012 at 05:06:42PM +0100, gianni lavaredo wrote:> thanks Hans. > > It's true your idea improve the speed in the analysis respect a row-by-row > loop. > > Sorry if I ask these questions to better understand and better performening > my code: > > 1) split command in GNU coreutils, $ split -l 1000000 testmem.csv > i never use this command. Is it possibile to coding in R or it's an > external command?external. split is - as I wrote - part of GNU coreutils.> do you have some links where i can study this command. Thankshttp://www.gnu.org/software/coreutils/> 2) is it possible to work with txt file?"txt file" is not a well defined concept, such a file could very well be a csv file, see http://en.wikipedia.org/wiki/Comma-separated_values ?read.csv