Hello I need to build a histogram from data (numbers in the [0,1] interval) stored in a number of different files. The total amount of data is very large, so I can't load everything to memory and then simply call hist(). Since what I actually need are the histogram counts, I'm currently doing it like this: breaks <- seq(0, 1, by = 0.01) files <- list.files(pattern = "some pattern") counts <- 0 for (file in files) { data <- scan(file, quiet = T) h <- hist(data, plot = F, breaks = breaks) counts <- counts + h$counts } # and then work with `counts' here Is there a more efficient and/or idiomatic way to do this? Thanks, Andre
On 6/11/2008, at 2:01 PM, Andre Nathan wrote:> Hello > > I need to build a histogram from data (numbers in the [0,1] interval) > stored in a number of different files. The total amount of data is > very > large, so I can't load everything to memory and then simply call > hist(). > Since what I actually need are the histogram counts, I'm currently > doing > it like this: > > breaks <- seq(0, 1, by = 0.01) > files <- list.files(pattern = "some pattern") > counts <- 0 > for (file in files) { > data <- scan(file, quiet = T) > h <- hist(data, plot = F, breaks = breaks) > counts <- counts + h$counts > } > # and then work with `counts' here > > Is there a more efficient and/or idiomatic way to do this?No. cheers, Rolf Turner ###################################################################### Attention:\ This e-mail message is privileged and confid...{{dropped:9}}
You can eliminate the loop like this (untested): cnt <- function(file) { data <- scan(file, quiet = TRUE) hist(data, plot = FALSE, breaks = breaks)$counts } Reduce("+", sapply(files, cnt)) On Wed, Nov 5, 2008 at 8:01 PM, Andre Nathan <andre at digirati.com.br> wrote:> Hello > > I need to build a histogram from data (numbers in the [0,1] interval) > stored in a number of different files. The total amount of data is very > large, so I can't load everything to memory and then simply call hist(). > Since what I actually need are the histogram counts, I'm currently doing > it like this: > > breaks <- seq(0, 1, by = 0.01) > files <- list.files(pattern = "some pattern") > counts <- 0 > for (file in files) { > data <- scan(file, quiet = T) > h <- hist(data, plot = F, breaks = breaks) > counts <- counts + h$counts > } > # and then work with `counts' here > > Is there a more efficient and/or idiomatic way to do this? > > Thanks, > Andre > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >