Hello.. I am pretty new for R, but if I understand it correctly, when R read data by doing something like "d <- read.table("/dev/stdin")", it read entire data first then start processing.. Is there anyway I can tweak R around so that it will start processing as data comes and not load everthing on memory at once? The reason for this is because we have a case where the data size could get quite large (could reach terabytes..) and all I am doing would be some basic statistical report. (Min,Max,Mean..) I have heared something about the PostgreSQL extension where it allows R to act on certain portion of the PostgreSQL data instead of reading it all. Can I do something similar to other datasource such as stdin? Thanks, Soichi ********************************************************************** The information contained in this communication is\ confiden...{{dropped}}
Hi On Wed, 21 Jul 2004, Hayashi Soichi - shayas wrote:> Is there anyway I can tweak R around so that it will start processing as > data comes and not load everthing on memory at once? The reason for this isHave you read through R Data Import/Export? There are several other ways, to name a few: scan() and RODBC -- I haven't used the later though. Cheers, Kevin -------------------------------- Ko-Kang Kevin Wang PhD Student Centre for Mathematics and its Applications Building 27, Room 1004 Mathematical Sciences Institute (MSI) Australian National University Canberra, ACT 0200 Australia Homepage: http://wwwmaths.anu.edu.au/~wangk/ Ph (W): +61-2-6125-2431 Ph (H): +61-2-6125-7411 Ph (M): +61-40-451-8301
On Wed, 21 Jul 2004 09:05:18 -0500, Hayashi Soichi - shayas <Soichi.Hayashi at acxiom.com> wrote :>Is there anyway I can tweak R around so that it will start processing as >data comes and not load everthing on memory at once?See the ?connections help topic. You can open a file, and then read lines from it in groups. For example, f <- file('MyData') open(f) read.table(f, nrows=100) read.table(f, nrows=100) ... close(f) and so on. Most functions in R won't know what do do with data handled in this way; you'll have to do a lot of work to do your calculations in pieces. Duncan Murdoch
Hello, I guess it depends on what you call large and what you want to do and the memory. I've used R on 25,000 cases of 50 variables (mostly factor levels) for exploratory frequencies and plotting and found it fast (On a T23 IBM Thinkpad 512MB 1GHz proc). Regards Nigel