gianni lavaredo
2012-May-15 20:16 UTC
[R] Problem to resolve a step for reading a large TXT and split in several file
Dear Researchs, It's the first time I am trying to resolve this problem. I have a TXT file with 1408452 rows. I wish to split file-by-file where each file has 1,000,000 rows with the following procedure: # split in two file one with 1,000,000 of rows and one with 408,452 of rows file <- "09G001_72975_7575_25_4025.txt" fileNames <- strsplit(as.character(file), ".", fixed = TRUE) fileNames.temp.1 <- unique(as.vector(do.call("rbind", fileNames)[, 1])) con <- file(file, open = "r") # n is the number of row n <- 1000000 i <- 0 while (length(readLines(con, n=n)) > 0 ) { i <- i + 1 pv <- read.table(con,header=F,sep="\t", nrow=n) write.table(pv, file = paste(fileNames.temp.1,"_",i,".txt",sep = ""), sep = "\t") } close(con) when I use 1,000,000 I have in the directory only "09G001_72975_7575_25_4025_1.txt" (with 1000000 of rows) and not "09G001_72975_7575_25_4025_2.txt" (with 408,452). I din't understand where is my bug Furthermore when i wish for example split in 3 files (where n is 469484 1408452/3) i have this message: *Error in read.table(con, header = F, sep = "\t", nrow = n) : no lines available in input* Thanks for all help and sorry for the disturb [[alternative HTML version deleted]]