Mike Marchywka
2011-Jun-11 12:42 UTC
[Rd] arbitrary size data frame or other stcucts, curious about issues invovled.
We keep getting questions on r-help about memory limits and I was curious to know what issues are involved in making common classes like dataframe work with disk and intelligent swapping? That is, sure you can always rely on OS for VM but in theory it should be possible to make a data structure that somehow knows what pieces you will access next and can keep thos somewhere fast. Now of course algorithms "should" act locally and be block oriented but in any case could communicate with data structures on upcoming access patterns, see a few ms into the future and have the right stuff prefetched. I think things like "bigmemory" exist but perhaps one issue was that this could not just drop in for data.frame or does it already solve all the problems? Is memory management just a non-issue or is there something that needs to be done to make large data structures work well? Thanks. ------------------- 415-264-8477 marchywka at phluant.com Online Advertising and Analytics for Mobile http://www.phluant.com
Jay Emerson
2011-Jun-20 19:12 UTC
[Rd] arbitrary size data frame or other stcucts, curious about issues invovled.
Mike, Neither bigmemory nor ff are "drop in" solutions -- though useful, they are primarily for data storage and management and allowing convenient access to subsets of the data. Direct analysis of the full objects via most R functions is not possible. There are many issues that could be discussed here (and have, previously), including the use of 32-bit integer indexing. There is a nice section "Future Directions" in the R Internals manual that you might want to look at. Jay ------------------------------------- Original message: We keep getting questions on r-help about memory limits and I was curious to know what issues are involved in making common classes like dataframe work with disk and intelligent swapping? That is, sure you can always rely on OS for VM but in theory it should be possible to make a data structure that somehow knows what pieces you will access next and can keep thos somewhere fast. Now of course algorithms "should" act locally and be block oriented but in any case could communicate with data structures on upcoming access patterns, see a few ms into the future and have the right stuff prefetched. I think things like "bigmemory" exist but perhaps one issue was that this could not just drop in for data.frame or does it already solve all the problems? Is memory management just a non-issue or is there something that needs to be done to make large data structures work well? -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay