thr3ads.net - R help - [R] Problem to resolve a step for reading a large TXT and, split in several file [May 2012]

If this information is useful, please help other people find it:
Share via:

Rui Barradas

2012-May-16 13:11 UTC

[R] Problem to resolve a step for reading a large TXT and, split in several file

Hello,

Your bug is obvious, each pass through the loop you read twice and write 
only once. The file pointer keeps moving forward...
Use something like

while (length(pv <- readLines(con, n=n)) > 0 ) {  # note that this line 
changed.
     i <- i + 1
     write.table(pv, file = paste(fileNames.temp.1, "_", i,
".txt", sep
= ""), sep = "\t")
}

(or put the line with read.table where you have readLines.)

Anyway, I don't like it very much. If you know the number of lines in 
the input file, it would be much better to use integer division and 
modulus to determine how many times and how much to read.
Something like

n <- 1000000

passes <- number.of.lines.in.file %/% n
remaining <- number.of.lines.in.file %% n

for(i in seq.int(passes)){

     [ ... read n lines at a time & process them...]

}
if(remaining){
     n <- remaining

     [ ...read what's left... ]
}


If you do not know how many lines are there in the file, see 
(package::function)

parser::nlines
R.utils::countLines

Hope this helps,

Rui Barradas


Em 16-05-2012 11:00, r-help-request at r-project.org
escreveu:> Date: Tue, 15 May 2012 22:16:42 +0200
> From: gianni lavaredo<gianni.lavaredo at gmail.com>
> To:r-help at r-project.org
> Subject: [R] Problem to resolve a step for reading a large TXT and
> 	split in several file
> Message-ID:
> 	<CAJ6JbR-YwgjsFu8o0UnvET6M8p8WvP7YbosXw5nRdz48woDsrw at
mail.gmail.com>
> Content-Type: text/plain
>
> Dear Researchs,
>
> It's the first time I am trying to resolve this problem. I have a TXT
file
> with 1408452 rows. I wish to split file-by-file where each file has
> 1,000,000 rows with the following procedure:
>
> # split in two file one with 1,000,000 of rows and one with 408,452 of rows
>
> file<- "09G001_72975_7575_25_4025.txt"
> fileNames<- strsplit(as.character(file), ".", fixed = TRUE)
> fileNames.temp.1<- unique(as.vector(do.call("rbind",
fileNames)[, 1]))
>
> con<- file(file, open = "r")
> # n is the number of row
> n<- 1000000
> i<- 0
> while (length(readLines(con, n=n))>  0 ) {
>      i<- i + 1
>      pv<- read.table(con,header=F,sep="\t", nrow=n)
>      write.table(pv, file =
paste(fileNames.temp.1,"_",i,".txt",sep = ""),
> sep = "\t")
> }
> close(con)
>
>
> when I use 1,000,000 I have in the directory only
> "09G001_72975_7575_25_4025_1.txt" (with 1000000 of rows) and not
> "09G001_72975_7575_25_4025_2.txt"  (with 408,452). I din't
understand where
> is my bug
>
> Furthermore when i wish for example split in 3 files (where n is 469484
> 1408452/3) i have this message:
>
> *Error in read.table(con, header = F, sep = "\t", nrow = n) :
>    no lines available in input*
>
> Thanks for all help and sorry for the disturb
>
> 	[[alternative HTML version deleted]]

Possibly Parallel Threads

Search for more reasonably related threads

R help - May 2012 - Problem to resolve a step for reading a large TXT and, split in several file

[R] Problem to resolve a step for reading a large TXT and, split in several file

Possibly Parallel Threads

Wisdom of the Ancients