thr3ads.net - R help - [R] Incrementally building histograms [Nov 2008]

If this information is useful, please help other people find it:
Share via:

Andre Nathan

2008-Nov-06 01:01 UTC

[R] Incrementally building histograms

Hello

I need to build a histogram from data (numbers in the [0,1] interval)
stored in a number of different files. The total amount of data is very
large, so I can't load everything to memory and then simply call hist().
Since what I actually need are the histogram counts, I'm currently doing
it like this:

breaks <- seq(0, 1, by = 0.01)
files <- list.files(pattern = "some pattern")
counts <- 0
for (file in files) {
  data <- scan(file, quiet = T)
  h <- hist(data, plot = F, breaks = breaks)
  counts <- counts + h$counts
}
# and then work with `counts' here

Is there a more efficient and/or idiomatic way to do this?

Thanks,
Andre

Rolf Turner

2008-Nov-06 02:10 UTC

head link

[R] Incrementally building histograms

On 6/11/2008, at 2:01 PM, Andre Nathan wrote:
> Hello
>
> I need to build a histogram from data (numbers in the [0,1] interval)
> stored in a number of different files. The total amount of data is  
> very
> large, so I can't load everything to memory and then simply call  
> hist().
> Since what I actually need are the histogram counts, I'm currently  
> doing
> it like this:
>
> breaks <- seq(0, 1, by = 0.01)
> files <- list.files(pattern = "some pattern")
> counts <- 0
> for (file in files) {
>   data <- scan(file, quiet = T)
>   h <- hist(data, plot = F, breaks = breaks)
>   counts <- counts + h$counts
> }
> # and then work with `counts' here
>
> Is there a more efficient and/or idiomatic way to do this?
	No.

		cheers,

			Rolf Turner

######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

Gabor Grothendieck

2008-Nov-06 02:16 UTC

head link

[R] Incrementally building histograms

You can eliminate the loop like this (untested):

cnt <- function(file) {
 data <- scan(file, quiet = TRUE)
 hist(data, plot = FALSE, breaks = breaks)$counts
}
Reduce("+", sapply(files, cnt))


On Wed, Nov 5, 2008 at 8:01 PM, Andre Nathan <andre at digirati.com.br>
wrote:> Hello
>
> I need to build a histogram from data (numbers in the [0,1] interval)
> stored in a number of different files. The total amount of data is very
> large, so I can't load everything to memory and then simply call
hist().
> Since what I actually need are the histogram counts, I'm currently
doing
> it like this:
>
> breaks <- seq(0, 1, by = 0.01)
> files <- list.files(pattern = "some pattern")
> counts <- 0
> for (file in files) {
>  data <- scan(file, quiet = T)
>  h <- hist(data, plot = F, breaks = breaks)
>  counts <- counts + h$counts
> }
> # and then work with `counts' here
>
> Is there a more efficient and/or idiomatic way to do this?
>
> Thanks,
> Andre
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Seemingly Similar Threads

Search for more maybe matching threads

R help - Nov 2008 - Incrementally building histograms

[R] Incrementally building histograms

[R] Incrementally building histograms

[R] Incrementally building histograms

Seemingly Similar Threads