Hi, I am currently working on reading large files into R. My files are text documents with four columns and around 10 million lines. Each line is set up as: string|integer|string|integer I have been trying to use read.table to read in the file, but I think I am reading too much into memory and the application quits. I want to be able to analyze the entire text document at once. I have thought about reading in the file, line by line, but I still want to store all the information together. I have also thought about writing each line of the file to a matrix, but I cannot seem to figure it out. Any help would be great. Thanks, Amy [[alternative HTML version deleted]]
How big is the entire text file? What is the length of an average line? Have you tried to use 'scan' to read in the data? How much memory do you have? Are you paging? Here was a quick test I did with a file with about 5M lines (12|this is some text|12345|more test):> system.time(x <- scan('/tempxx.txt', what=list(0,'',0,''), sep="|"))Read 4460544 records user system elapsed 50.70 1.04 54.16> > str(x)List of 4 $ : num [1:4460544] 12 12 12 12 12 12 12 12 12 12 ... $ : chr [1:4460544] "this is some text" "this is some text" "this is some text" "this is some text" ... $ : num [1:4460544] 12345 12345 12345 12345 12345 ... $ : chr [1:4460544] "more test" "more test" "more test" "more test" ...> object.size(x)107053288 bytes So some more details might help to evaluate what your problem is. Took less than a minute to read it in. On Wed, Jul 8, 2009 at 4:00 PM, Amy Wesolowski<amywesolowski at gmail.com> wrote:> Hi, > > I am currently working on reading large files into R. ?My files are text > documents with four columns and around 10 million lines. > Each line is set up as: > string|integer|string|integer > > I have been trying to use read.table to read in the file, but I think I am > reading too much into memory and the application quits. > > I want to be able to analyze the entire text document at once. > I have thought about reading in the file, line by line, but I still want to > store all the information together. ?I have also thought about writing each > line of the file to a matrix, but I cannot seem to figure it out. > > Any help would be great. > Thanks, > Amy > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
You could try something like this: library(sqldf) DF <- read.csv.sql("myfile.txt", sep = "|", header = FALSE) or possibly this, which is the same except instead of using an "in memory" database it uses an external database: DF <- read.csv.sql("myfile.txt", sep = "|", header = FALSE, dbname = "temp.db") In both cases it creates the database automatically and then destroys it automatically. You may need to adjust the arguments depending on what your data looks like. Since it does not use read.table underneath any limitations of read.table would not apply. You might want to test it out with the first few rows to get the arguments right and then if it seems to work try it with the full data. On Wed, Jul 8, 2009 at 4:00 PM, Amy Wesolowski<amywesolowski at gmail.com> wrote:> Hi, > > I am currently working on reading large files into R. ?My files are text > documents with four columns and around 10 million lines. > Each line is set up as: > string|integer|string|integer > > I have been trying to use read.table to read in the file, but I think I am > reading too much into memory and the application quits. > > I want to be able to analyze the entire text document at once. > I have thought about reading in the file, line by line, but I still want to > store all the information together. ?I have also thought about writing each > line of the file to a matrix, but I cannot seem to figure it out. > > Any help would be great. > Thanks, > Amy > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >