Dear All, I have a massive dataset from which I would like to draw a histogram. Any ideas on how to accomplish this? Thanks in advance, Paul
Hi: I would suggest that you avoid the histogram and make a density plot instead. It would be more informative and probably require a lot less time and ink. If you're married to the histogram concept, try taking a sample of about 10000 and get a histogram of that instead. The result shouldn't be much different from that of the entire sample - to test out this hypothesis, take several random samples of size 10000 and compare the histograms. If they're not much different in shape, it's likely that the full sample is close to the same. If there are noticeable differences, try 50000 or 100000 instead (rinse and repeat). HTH, Dennis On Fri, Jul 15, 2011 at 4:21 AM, Paul Smith <phhs80 at gmail.com> wrote:> Dear All, > > I have a massive dataset from which I would like to draw a histogram. > Any ideas on how to accomplish this? > > Thanks in advance, > > Paul > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hello, I assume you have imported the dataset. You can use the hist from the graphics package from the main R program. A tricky part is that the freq=TRUE (the default) plots frequencies and freq=FALSE plots probability densities, not percent of the histogram cells. You can sum the counts and calculate the percent before plotting. hist1<-hist(varname, plot=FALSE) sum <- sum(hist1$counts) hist1$counts <- hist1$counts/sum*100 plot(hist1, main=paste("Histogram of",deparse(substitute(varname))), xlab=deparse(substitute(varname)), ylab="Percent", ) Also, if you are new to R, there are very useful manuals and guides at http://cran.r-project.org/manuals.html . You can look up documention in R, such as ?hist command for documentation for hist function. Regards, Kyaw Sint (Joe)> Dear All, > > I have a massive dataset from which I would like to draw a histogram. > Any ideas on how to accomplish this? > > Thanks in advance, > > Paul
On Mon, Jul 18, 2011 at 9:11 PM, Joshua Wiley <jwiley.psych at gmail.com> wrote:>> [snip] I guess that I must have a data frame to plot a histogram. > > Not at all! > > ## a *vector* of 100 million observation > x <- rnorm(10^8) > ## a histogram for it (see attached for the result from my system) > hist(x) > > No data frame required. ?I would not try this straight in anything but > traditional graphics for a 100 million observation vector, but if you > wanted it made in ggplot2 or something, you could prebin the data and > THEN plot bars corresponding to the bins.Thanks, Joshua, for your answer. True: A vector is enough to supply data for hist(). But my point is: Can a histogram be drawn without having all data on the computer memory? You partially answer this question by suggesting to prebind the data. Can this prebinning process be done transparently but chunk by chunk of data underneath? Paul