stephanschlueter@gmx.de
2004-Jun-02 11:34 UTC
[Rd] a fault in the "hist" - function (PR#6931)
Full_Name: Stephan Schlueter Version: 1.9.0 OS: Submission from: (NULL) (217.184.109.24) During my studies, I found a fault in the hist()-function: If you have a vector x with values around zero and also bigger than 10,000,000 , there will be a shift of -max(x)/10,000,000 in the hist-datas. See my example: x<-runif(10000) hist(x,breaks=c(seq(-3,3,0.1)),prob=TRUE) #everything ok, but now produce the problem x[254]=20000000 hist(x,breaks=c(seq(-3,3,0.1),max(x)),prob=TRUE,xlim=c(-3,3)) #here you can see the shift hist(x + max(x)/10000000,breaks=c(seq(-3,3,0.1),max(x)),prob=TRUE,xlim=c(-3,3)) #first solution (but I don't know ,why it works) for(i in 1:10000) { if(x[i]>10)x[i]=10 } hist(x,breaks=c(seq(-3,3,0.1),max(x)),prob=TRUE,xlim=c(-3,3)) #second solution (the better one I think) Good Luck for the solution of this problem, and it would be nice to send me an answer. Thanks and till then, Stephan Schlueter
ligges@statistik.uni-dortmund.de
2004-Jun-02 15:34 UTC
[Rd] a fault in the "hist" - function (PR#6931)
stephanschlueter@gmx.de wrote:> Full_Name: Stephan Schlueter > Version: 1.9.0 > OS: > Submission from: (NULL) (217.184.109.24) > > > During my studies, I found a fault in the hist()-function: > If you have a vector x with values around zero and also bigger than 10,000,000 , > there will be a shift of -max(x)/10,000,000 in the hist-datas. > See my example: > > x<-runif(10000) > hist(x,breaks=c(seq(-3,3,0.1)),prob=TRUE) > #everything ok, but now produce the problem > x[254]=20000000 > hist(x,breaks=c(seq(-3,3,0.1),max(x)),prob=TRUE,xlim=c(-3,3)) > #here you can see the shift > > hist(x + max(x)/10000000,breaks=c(seq(-3,3,0.1),max(x)),prob=TRUE,xlim=c(-3,3)) > #first solution (but I don't know ,why it works) > > for(i in 1:10000) > { > if(x[i]>10)x[i]=10 > } > hist(x,breaks=c(seq(-3,3,0.1),max(x)),prob=TRUE,xlim=c(-3,3)) > #second solution (the better one I think) > > > Good Luck for the solution of this problem, and it would be nice to send me an > answer. > Thanks and till then, Stephan Schlueter > > ______________________________________________ > R-devel@stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-develThe problem is in hist.default(): diddle <- 1e-7 * max(abs(range(breaks))) and whereever we are diddling - there are some disadvantages. Do we want a flag that turns off diddling and the following "fuzz" stuff? Or do we want something to adjust the hardcoded heuristical value "1e-7" (to zero, for example)? Uwe Ligges
ripley@stats.ox.ac.uk
2004-Jun-09 09:56 UTC
[Rd] a fault in the "hist" - function (PR#6931)
On 2 Jun 2004, Peter Dalgaard wrote:> ligges@statistik.uni-dortmund.de writes: > > > The problem is in hist.default(): > > > > diddle <- 1e-7 * max(abs(range(breaks))) > > > > and whereever we are diddling - there are some disadvantages. > > > > Do we want a flag that turns off diddling and the following "fuzz" > > stuff? Or do we want something to adjust the hardcoded heuristical value > > "1e-7" (to zero, for example)? > > Neither, I think, since the diddle is there for a reason, and the only > real problem is the use of breaks that are wildly off-scale. We might > key diddle to xlim instead, or possibly let "diddle" be an argumentWe can't do that, as hist might not be used to plot.> with a suitable default. > > You probably can't get all cases completely right though. A tiny range > of numbers (compared to the mean) is likely to cause problems whatever > you do.I think the fuzz really needs to be relative to the adjacent bin size (and the one to the left or right as appropriate). So I am going to replace diddle <- 1e-7 * max(abs(range(breaks))) by diddle <- 1e-7 * median(diff(breaks)) that is to use a typical bin size to set the fuzz factor. (Note: I know this is typically a bit smaller, but 1e-7 was a rather large tolerance.) [I hadn't realized we used the largest limit and not the range (normal sense) of the data. There is also something of a design error in that we shift the breaks and not the data.] -- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595