Hello,
Do you need /all/ the data in memory at one time? Is your goal to
divide the data (e.g according to some factor /or/ some function of
the columns of data set ) and then analyze the divisions? And then,
possibly, combine the results ?
If so, you might consider using Rhipe. We have analyzed (e.g get
regression parameters, apply algorithms) across subsets of data where
the subsets are created according to some condition.
Using this approach(and a cluster of 8 machines, 72 cores) we
successfully analyzed data sets ranging from 14GB to ~140GB .
This all assumes that your divisions are suitably small - i notice
you mention that each region is 10-20 GB and you want to compute on
/all/ i.e you need all of it in memory. If so, Rhipe cannot help you.
Regards
Saptarshi
On Thu, Feb 4, 2010 at 8:27 PM, Vadlamani, Satish {FLNA}
<SATISH.VADLAMANI at fritolay.com> wrote:> Folks:
> I am trying to read in a large file. Definition of large is:
> Number of lines: 333, 250
> Size: 850 MB
>
> The maching is a dual core intel, with 4 GB RAM and nothing else running on
it. I read the previous threads on read.fwf and did not see any conclusive
statements on how to read fast. Example record and R code given below. I was
hoping to purchase a better machine and do analysis with larger datasets - but
these preliminary results do not look good.
>
> Does anyone have any experience with large files (> 1GB) and using them
with Revolution-R?
>
>
> Thanks.
>
> Satish
>
> Example Code
> key_vec <- c(1,3,3,4,2,8,8,2,2,3,2,2,1,3,3,3,3,9)
> key_names <-
c("allgeo","area1","zone","dist","ccust1","whse","bindc","ccust2","account","area2","ccust3","customer","allprod","cat","bu","class","size","bdc")
> key_info <- data.frame(key_vec,key_names)
> col_names <- c(key_names,sas_time$week)
> num_buckets <- rep(12,209)
> width_vec = c(key_vec,num_buckets)
> col_classes<-c(rep("factor",18),rep("numeric",209))
> #threewkoutstat <-
read.fwf(file="3wkoutstatfcst_file02.dat",widths=width_vec,header=FALSE,colClasses=col_classes,n=100)
> threewkoutstat <-
read.fwf(file="3wkoutstatfcst_file02.dat",widths=width_vec,header=FALSE,colClasses=col_classes)
> names(threewkoutstat) <- col_names
>
> Example record (only one record pasted below)
> A004001003799000049250000492599990049999A001002002015002015009 ? ? ? ?0.00
? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ?
?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ?
? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ?
?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ?
? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ?
?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ?
? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ?
?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ?
? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ?
?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ?
? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ?
?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? !
> ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ?
?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ?
? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ?
?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ?
? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ?
?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ?
? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.60 ? ? ?
?0.60 ? ? ? ?0.60 ? ? ? ?0.70 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ?
? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ?
?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ?
? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ?
?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ?
? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? !
> ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00
> ? 0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ?
?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ?
? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ?
?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ?
? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ?
?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ?
? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00 ? ? ? ?0.00
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>