Hello: I'm dealing with an issue currently that I'm not sure the best way to approach. I've got a very large (10G+) dataset that I'm trying to create a histogram for. I don't seem to be able to use hist directly as I can not create an R vector of size greater than 2.2G. I considered condensing the data previous to loading it into R and just plotting the frequencies as a barplot; unfortunately, barplot does not support plotting the values according to a set of x-axis positions. What I have is something similar to: ys <- c(12,3,7,22,10) xs <- c(1,30,35,39,60) and I'd like the bars (ys) to appear at the positions described by xs. I can get this to work on smaller sets by filling zero values in for missing ys for the entire range of xs but in my case this would again create a vector too large for R. Is there another way to use the two vectors to create a simulated frequency histogram? Is there a way to create a histogram object (as returned by hist) from the condensed data so that plot would handle it correctly? Thanks in advance, Jesse
Perhaps plot(xs, ys, type = "h", lwd = 3) will work? I'm not sure that a direct call to hist(, plot = F) will get around the data problems. If you type getAnywhere(hist.default) you can see the code that runs hist(): perhaps you can extract the working bits you need. Michael On Fri, Nov 4, 2011 at 2:04 PM, Jesse Brown <jesse.r.brown at lmco.com> wrote:> Hello: > > I'm dealing with an issue currently that I'm not sure the best way to > approach. I've got a very large (10G+) dataset that I'm trying to create a > histogram for. I don't seem to be able to use hist directly as I can not > create an R vector of size greater than 2.2G. I considered condensing the > data ?previous to loading it into R ?and just plotting the frequencies as a > barplot; unfortunately, barplot does not support plotting the values > according to a set of x-axis positions. > > What I have is something similar to: > > ys <- c(12,3,7,22,10) > xs <- c(1,30,35,39,60) > > and I'd like the bars (ys) to appear at the positions described by xs. I can > get this to work on smaller sets by filling zero values in for missing ys > for the entire range of xs but in my case this would again create a vector > too large for R. > > Is there another way to use the two vectors to create a simulated frequency > histogram? Is there a way to create a histogram object (as returned by hist) > from the condensed data so that plot would handle it correctly? > > Thanks in advance, > > Jesse > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On 04/11/2011 2:04 PM, Jesse Brown wrote:> Hello: > > I'm dealing with an issue currently that I'm not sure the best way to > approach. I've got a very large (10G+) dataset that I'm trying to create > a histogram for. I don't seem to be able to use hist directly as I can > not create an R vector of size greater than 2.2G. I considered > condensing the data previous to loading it into R and just plotting > the frequencies as a barplot; unfortunately, barplot does not support > plotting the values according to a set of x-axis positions. > > What I have is something similar to: > > ys<- c(12,3,7,22,10) > xs<- c(1,30,35,39,60) > > and I'd like the bars (ys) to appear at the positions described by xs. I > can get this to work on smaller sets by filling zero values in for > missing ys for the entire range of xs but in my case this would again > create a vector too large for R. > > Is there another way to use the two vectors to create a simulated > frequency histogram? Is there a way to create a histogram object (as > returned by hist) from the condensed data so that plot would handle it > correctly?Follow your own last suggestion. Take a small subset of your data, and calculate x <- hist(data, plot=FALSE) str(x) will show you the structure of the object in x. Modify the entries to reflect your full dataset, and then plot(x) will show it. Duncan Murdoch
On 11/05/2011 05:04 AM, Jesse Brown wrote:> Hello: > > I'm dealing with an issue currently that I'm not sure the best way to > approach. I've got a very large (10G+) dataset that I'm trying to create > a histogram for. I don't seem to be able to use hist directly as I can > not create an R vector of size greater than 2.2G. I considered > condensing the data previous to loading it into R and just plotting the > frequencies as a barplot; unfortunately, barplot does not support > plotting the values according to a set of x-axis positions. > > What I have is something similar to: > > ys <- c(12,3,7,22,10) > xs <- c(1,30,35,39,60) > > and I'd like the bars (ys) to appear at the positions described by xs. I > can get this to work on smaller sets by filling zero values in for > missing ys for the entire range of xs but in my case this would again > create a vector too large for R. > > Is there another way to use the two vectors to create a simulated > frequency histogram? Is there a way to create a histogram object (as > returned by hist) from the condensed data so that plot would handle it > correctly? >Hi Jesse, I think that barp (plotrix) will get you out of trouble. Jim