Hi, I have a large amount of data that I would like to create a histogram of and plot and do things with in R. It is pretty much impossible to read the data into R, so I have written a program to bin the data and now have a list of counts in each bin. Is it possible to somehow import this into R and use hist(), so I can, for instance, plot the probability density? I have looked at the help page for hist(), but couldn't find anything related to this there. Regards, Nicky Chorley [[alternative HTML version deleted]]
Nick Chorley-3 wrote:> > I have a large amount of data that I would like to create a histogram of > and > plot and do things with in R. It is pretty much impossible to read the > data > into R, so I have written a program to bin the data and now have a list of > counts in each bin. Is it possible to somehow import this into R and use > hist(), so I can, for instance, plot the probability density? I have > looked > at the help page for hist(), but couldn't find anything related to this > there. >Hi! And why do you think, its impossible to import the data in R? It can handle rather large data volumes, especially in Linux. Just study help("Memory-limits"). You can plot something looking like a histogram using barplot() or plot(... type="h"). You can create the "histogram" class object manually. For example, [ import bin counts... probably, it is a table of 2 columns, defining bin borders and counts. let's store it in ncounts. ]> hst<-hist(rnorm(nrow(ncounts)),plot=FALSE) > str(hst) # actually I called hist(rnorm(100))List of 7 $ breaks : num [1:10] -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 $ counts : int [1:9] 3 6 12 9 24 19 14 9 4 $ intensities: num [1:9] 0.06 0.12 0.24 0.18 0.48 ... $ density : num [1:9] 0.06 0.12 0.24 0.18 0.48 ... $ mids : num [1:9] -2.25 -1.75 -1.25 -0.75 -0.25 0.25 0.75 1.25 1.75 $ xname : chr "rnorm(100)" $ equidist : logi TRUE - attr(*, "class")= chr "histogram"> hst$breaks <- [ bsdfgsdghsdghdfgh ] > hst$counts <- [ asfd109,mnasdfkjhdsfl ] > hst$intensities <-Studying the hist.default() sources will help you to understand, how every list element is created. -- View this message in context: http://www.nabble.com/Possible-to-%22import%22-histograms-in-R--tf4271809.html#a12158586 Sent from the R help mailing list archive at Nabble.com.
On 15-Aug-07 08:30:08, Nick Chorley wrote:> Hi, > > I have a large amount of data that I would like to create a > histogram of and plot and do things with in R. It is pretty > much impossible to read the data into R, so I have written a > program to bin the data and now have a list of counts in each > bin. Is it possible to somehow import this into R and use > hist(), so I can, for instance, plot the probability density? > I have looked at the help page for hist(), but couldn't find > anything related to this there. > > Regards, > > Nicky ChorleyPresumably you now have (or can readily generate) files on your system whose contents are (or are equivalent to) something like: brkpts.dat 0.0 0.5 1.0 .... 9.5 10.0 counts.dat 10 7 38 .... 7 0 where there is one more line in brkpts.dat than in counts.dat Now simply read both files into R, creating variables 'brkpts', 'counts' Now create a histogram template (any silly old data will do): H1 <- hist(c(1,2)) Next, attach your variables to it: H1$breaks <- brkpts H1$counts <- counts and you have your histogram in R. Also, you can use the data in the variables 'brkpts', 'counts' to feed into any other procedure which can acept data in this form. Example (simulating the above in R): brkpts<-0.5*(0:20) counts<-rpois(20,7.5) H1<-hist(c(1,2)) H1$breaks <- brkpts H1$counts <- counts plot(H1) Or, if you want a more "realistic-looking" one, follow on with: midpts<-(brkpts[1:20]+brkpts[2:21])/2 counts<-rpois(20,100*dnorm(midpts,mean=5,sd=3)) H1$breaks <- brkpts H1$counts <- counts plot(H1) In other words, you've already done R's work for it with your program which bins the data. All you need to do in R is to get these results into the right place in R. Hoping this helps, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 15-Aug-07 Time: 10:38:05 ------------------------------ XFMail ------------------------------
Hello Nick, Wednesday, August 15, 2007, 1:18:34 PM, you wrote: NC> On 15/08/07, Vladimir Eremeev <wl2776 at gmail.com> wrote: NC> Nick Chorley-3 wrote:>> >> I have a large amount of data that I would like to create a histogram of >> and >> plot and do things with in R. It is pretty much impossible to read the >> data >> into R, so I have written a program to bin the data and now have a list of >> counts in each bin. Is it possible to somehow import this into R and use >> hist(), so I can, for instance, plot the probability density? I have >> looked >> at the help page for hist(), but couldn't find anything related to this >> there. >>NC> Hi! And why do you think, its impossible to import the data in R? NC> It can handle rather large data volumes, especially in Linux. Just study NC> help("Memory-limits"). NC> My data file is 4.8 GB! NC> You can plot something looking like a histogram using barplot() or plot(... NC> type="h"). NC> The problem with those is that I can't plot the probability density. NC> You can create the "histogram" class object manually. NC> For example, NC> [ import bin counts... probably, it is a table of 2 columns, defining bin NC> borders and counts. NC> ??let's??store it in ncounts. ] NC> Yes, that's what I have.>> hst<-hist(rnorm(nrow(ncounts)),plot=FALSE) >> str(hst)??# actually I called hist(rnorm(100)) >> List of 7 >> $ breaks???? : num [1:10] -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 >> $ counts???? : int [1:9] 3 6 12 9 24 19 14 9 4 >> $ intensities: num [1:9] 0.06 0.12 0.24 0.18 0.48 ... >> $ density????: num [1:9] 0.06 0.12 0.24 0.18 0.48 ... >> $ mids?????? : num [1:9] -2.25 -1.75 -1.25 -0.75 -0.25 0.25 0.75 1.25 1.75 >> $ xname??????: chr "rnorm(100)" >> $ equidist?? : logi TRUE >> - attr(*, "class")= chr "histogram" >> hst$breaks <-??[ bsdfgsdghsdghdfgh ] >> hst$counts <-??[ asfd109,mnasdfkjhdsfl ] >> hst$intensities <-NC> My data isn't normally distributed, so I tried rexp() rather NC> than rnorm(), but it's not looking like it should The call of the random generator doesn't matter, since it is used just to create a numeric vector for the hist(). And call to hist() just creates the dummy structure, which you must fill with your data. You then replace the returned result with yours. You can call hist(1:100) with the same success. And any other numeric vector can be used to call hist. If the result doesn't look like it should then you, probably, incorrectly or incompletely altered the list returned by hist(). Actually, you can create this structure from scratch: hst<-list(breaks= [your breaks], counts= [your counts], intensities = [your intensities], density=[your density], mids= [your mids], xname= "hist(of your data)", equidist=TRUE [or FALSE] ) attr(hst,"class")<-"histogram">> Studying the hist.default() sources will help you to understand, how every >> list element is created.Type hist.default (without parentheses) on the R prompt, and it will display you the sources of this function. You can also use dump(hist.default,file="hist_default.R") to save it to a text file. -- Best regards, Vladimir mailto:wl2776 at gmail.com --SevinMail--
On 15/08/07, Vladimir Eremeev <wl2776 at gmail.com> wrote: Nick Chorley-3 wrote:> > I have a large amount of data that I would like to create a histogram of > and > plot and do things with in R. It is pretty much impossible to read the > data > into R, so I have written a program to bin the data and now have a list of > counts in each bin. Is it possible to somehow import this into R and use > hist(), so I can, for instance, plot the probability density? I have > looked > at the help page for hist(), but couldn't find anything related to this > there. >Hi! And why do you think, its impossible to import the data in R? It can handle rather large data volumes, especially in Linux. Just study help("Memory-limits"). My data file is 4.8 GB! You can plot something looking like a histogram using barplot() or plot(... type="h"). The problem with those is that I can't plot the probability density. You can create the "histogram" class object manually. For example, [ import bin counts... probably, it is a table of 2 columns, defining bin borders and counts. ??let's??store it in ncounts. ] Yes, that's what I have.> hst<-hist(rnorm(nrow(ncounts)),plot=FALSE) > str(hst)??# actually I called hist(rnorm(100))List of 7 $ breaks???? : num [1:10] -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 $ counts???? : int [1:9] 3 6 12 9 24 19 14 9 4 $ intensities: num [1:9] 0.06 0.12 0.24 0.18 0.48 ... $ density????: num [1:9] 0.06 0.12 0.24 0.18 0.48 ... $ mids?????? : num [1:9] -2.25 -1.75 -1.25 -0.75 -0.25 0.25 0.75 1.25 1.75 $ xname??????: chr "rnorm(100)" $ equidist?? : logi TRUE - attr(*, "class")= chr "histogram"> hst$breaks <-??[ bsdfgsdghsdghdfgh ] > hst$counts <-??[ asfd109,mnasdfkjhdsfl ] > hst$intensities <-My data isn't normally distributed, so I tried rexp() rather than rnorm(), but it's not looking like it should Studying the hist.default() sources will help you to understand, how every list element is created. -- View this message in context: http://www.nabble.com/Possible-to-%22import%22-histograms-in-R--tf4271809.html#a12158586 Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --SevinMail--