thr3ads.net - R devel - [Rd] 0.5 != integrate(dnorm,0,20000) = 0 [Dec 2010]

If this information is useful, please help other people find it:
Share via:

Spencer Graves

2010-Dec-06 23:32 UTC

[Rd] 0.5 != integrate(dnorm,0,20000) = 0

Hello:


       The example "integrate(dnorm,0,20000)" says it "fails on
many
systems".  I just got 0 from it, when I should have gotten either an 
error or something close to 0.5.  I got this with R 2.12.0 under both 
Windows Vista_x64 and Linux (Fedora 13);  see the results from Windows 
below.  I thought you might want to know.


       Thanks for all your work in creating and maintaining R.


       Best Wishes,
       Spencer Graves
###############################

integrate(dnorm,0,20000) ## fails on many systems
0 with absolute error < 0
 > sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

Prof Brian Ripley

2010-Dec-07 07:41 UTC

head link

[Rd] 0.5 != integrate(dnorm,0,20000) = 0

On Mon, 6 Dec 2010, Spencer Graves wrote:
> Hello:
>
>
>      The example "integrate(dnorm,0,20000)" says it "fails
on many systems".
> I just got 0 from it, when I should have gotten either an error or
something
> close to 0.5.  I got this with R 2.12.0 under both Windows Vista_x64 and 
> Linux (Fedora 13);  see the results from Windows below.  I thought you
might
> want to know.
Well, isn't that exactly what the help page says happens?  That 
example is part of a section entitled

      ## integrate can fail if misused

and is part of the illustration of

      If the function is
      approximately constant (in particular, zero) over nearly all its
      range it is possible that the result and error estimate may be
      seriously wrong.


>
>
>      Thanks for all your work in creating and maintaining R.
>
>
>      Best Wishes,
>      Spencer Graves
> ###############################
>
> integrate(dnorm,0,20000) ## fails on many systems
> 0 with absolute error < 0
>> sessionInfo()
> R version 2.12.0 (2010-10-15)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252
> [2] LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

baptiste auguie

2010-Dec-07 08:07 UTC

head link

[Rd] 0.5 != integrate(dnorm,0,20000) = 0

Hi,

I was recently given some interesting tips on a similar issue, see
R-help "puzzle with integrate over infinite range"
<http://www.r-help.com/list/85/713882.html>

Maybe "fails" can be a bit misleading here (fails to produce the
actual result vs. returning an error message). As a result of this
previous discussion, I don't think it would be possible to return an
error: as far as the algorithm knows, the value it calculated is 0
because the integrand was 0 everywhere. To know better, the program
would need to sample the integrand at more points (which can be
achieved by changing the interval, or better, by setting the tolerance
to a lower value).

baptiste



On 7 December 2010 00:32, Spencer Graves
<spencer.graves at structuremonitoring.com> wrote:> Hello:
>
>
> ? ? ?The example "integrate(dnorm,0,20000)" says it "fails
on many systems".
> ?I just got 0 from it, when I should have gotten either an error or
> something close to 0.5. ?I got this with R 2.12.0 under both Windows
> Vista_x64 and Linux (Fedora 13); ?see the results from Windows below. ?I
> thought you might want to know.
>
>
> ? ? ?Thanks for all your work in creating and maintaining R.
>
>
> ? ? ?Best Wishes,
> ? ? ?Spencer Graves
> ###############################
>
> integrate(dnorm,0,20000) ## fails on many systems
> 0 with absolute error < 0
>> sessionInfo()
> R version 2.12.0 (2010-10-15)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252
> [2] LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

Spencer Graves

2010-Dec-08 22:30 UTC

head link

[Rd] Suggested change to integrate.Rd

That sounds like a great idea to me:  This should give the Core R team 
more time to worry about the code by delegating maintenance of the help 
files to a larger group.


Spencer


On 12/8/2010 2:22 PM, John Nolan wrote:> Well, you can't idiot-proof things, but you can give clear descriptions
and
> warnings.
> To take things to the extreme, one can eliminate all help files.  If a user
> really wants
> to understand things, they can read the source code, right?
>
> This is a general question for r-dev: who are the help files aimed at? If
> the
> answer is experts only, then don't put any more effort into help files.
> But if you
> want more users to be able to do more things, then more explanation will
> help.
>
> Perhaps there should be a "documentation team" (r-doc?) that
intersects
> r-dev, but
> focuses on documentation?
>
> John,  American U
>
>
>
>
> From:	"Ravi Varadhan"<rvaradhan at jhmi.edu>
> To:	"'John Nolan'"<jpnolan at american.edu>,
>              <spencer.graves at structuremonitoring.com>
> Cc:	<r-devel at r-project.org>
> Date:	12/08/2010 10:43 AM
> Subject:	RE: [Rd] Suggested change to integrate.Rd (was: Re: 0.5 !>     
integrate(dnorm, 0, 20000) = 0)
>
>
>
> Hi,
>
> My honest and (not so) humble opinion is that no amount of clear and
> explicit warning can totally prevent the inappropriate use of any tool.
> Users will continue to use the tools, without doing the necessary
> background
> work to figure out whether the that tool is the appropriate one for their
> particular problem.  If things can go so horribly wrong in such a simple
> case, imagine all the snares and traps present in complex, high-dimensional
> integration.  Even the best cubature rules or the MCMC methods can give
> wrong results.  Even worse, how in heaven's name can we be sure that
the
> answer is any good?  The simple and best solution is to understand your
> integrand as best as you can.  I realize that this may be viewed as being
> too pedantic, but unfortunately, it is also the best advice.
>
> Best,
> Ravi.
> -------------------------------------------------------
> Ravi Varadhan, Ph.D.
> Assistant Professor,
> Division of Geriatric Medicine and Gerontology School of Medicine Johns
> Hopkins University
>
> Ph. (410) 502-2619
> email: rvaradhan at jhmi.edu
>
>
> -----Original Message-----
> From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at
r-project.org]
> On Behalf Of John Nolan
> Sent: Tuesday, December 07, 2010 11:09 PM
> To: spencer.graves at structuremonitoring.com
> Cc: r-devel at r-project.org
> Subject: Re: [Rd] Suggested change to integrate.Rd (was: Re: 0.5 !>
integrate(dnorm, 0, 20000) = 0)
>
> R developers understand intimately how things work, and terse
> descriptions are sufficient.  However, most typical R users
> would benefit from clearer documentation.  In multiple places
> I've found the R documentation to be correct and understandable
> AFTER I've figured a function out.
>
> And to be fair, this problem with integrate( ) isn't really R's
> fault: the QUADPACK routines that R uses are very good algorithms,
> but neither they nor any other package can handle all cases.
>
> I would support reasonable changes in the documentation for
> integrate( ).   Just saying it "gives wrong answer without
> warning on many systems" seems misleading (it works fine in
> many cases) and it doesn't help a user understand how to use
> integrate( ) correctly/carefully.  IMO a simple example like
> this one w/ dnorm would catch peoples attention and a couple
> lines of explanation/warning would then make more sense.
>
> John Nolan, American U
>
>
> -----Spencer Graves<spencer.graves at structuremonitoring.com> 
wrote: -----
> To: John Nolan<jpnolan at american.edu>
> From: Spencer Graves<spencer.graves at structuremonitoring.com>
> Date: 12/07/2010 07:58PM
> Cc: pchausse at uwaterloo.ca, r-devel at r-project.org
> Subject: Suggested change to integrate.Rd (was: Re: [Rd] 0.5 !>
integrate(dnorm,0,20000) = 0)
>
>         What do you think about changing the verbiage with that example
> in "integrate.Rd" from "fails on many systems" to
something like
> "gives wrong answer without warning on many systems"?
>
>
>         If I had write access to the core R code, I'd change this
> myself:  I'm probably not the only user who might think that saying
> something "fails" suggest it gives an error message.  Many
contributions
> on this thread make it clear that it will never be possible to write an
> integrate function that won't give a "wrong answer without
warning" in
> some cases.
>
>
>         Thanks,
>         Spencer
>
>
> #############################
> On 12/7/2010 7:02 AM, John Nolan wrote:
>> Putting in Inf for the upper bound does not work in general:
>> all 3 of the following should give 0.5
>>
>>> integrate( dnorm, 0, Inf )
>> 0.5 with absolute error<   4.7e-05
>>
>>> integrate( dnorm, 0, Inf, sd=100000 )
>> Error in integrate(dnorm, 0, Inf, sd = 1e+05) :
>>     the integral is probably divergent
>>
>>> integrate( dnorm, 0, Inf, sd=10000000 )
>> 5.570087e-05 with absolute error<   0.00010
>>
>> Numerical quadrature methods look at a finite number of
>> points, and you can find examples that will confuse any
>> algorithm.  Rather than hope a general method will solve
>> all problems, users should look at their integrand and
>> pick an appropriate region of integration.
>>
>> John Nolan, American U.
>>
>>
>> -----r-devel-bounces at r-project.org wrote: -----
>> To: r-devel at r-project.org
>> From: Pierre Chausse
>> Sent by: r-devel-bounces at r-project.org
>> Date: 12/07/2010 09:46AM
>> Subject: Re: [Rd] 0.5 != integrate(dnorm,0,20000) = 0
>>
>>     The warning about "absolute error == 0" would not be
sufficient
>> because if you do
>>    >   integrate(dnorm, 0, 5000)
>> 2.326323e-06 with absolute error<   4.6e-06
>>
>> We get reasonable absolute error and wrong answer. For very high upper
>> bound, it seems more stable to use "Inf". In that case,
another
>> .External is used which seems to be optimized for high or low bounds:
>>
>>    >   integrate(dnorm, 0,Inf)
>> 0.5 with absolute error<   4.7e-05
>>
>>
>> On 10-12-07 8:38 AM, John Nolan wrote:
>>> I have wrestled with this problem before.  I think correcting
>>> the warning to "absolute error ~<= 0" is a good idea,
and printing
>>> a warning if subdivisions==1 is also helpful.  Also, including
>>> a simple example like the one that started this thread on the
>>> help page for integrate might make the issue more clear to users.
>>>
>>> But min.subdivisions is probably not.  On the example with dnorm(
),
>>> I doubt 3 subdivisions would work.  The problem isn't that
>>> we aren't sudividing enough, the problem is that the integrand
>>> is 0 (in double precision) on most of the region and the
>>> algorithm isn't designed to handle this.  There is no way to
>>> determine how many subdivisions are necessary to get a reasonable
>>> answer without a detailed analysis of the integrand.
>>>
>>> I've gotten useful results with integrands that are monotonic
on
>>> the tail with a "self-triming integration" routine
>>> like the following:
>>>
>>>> right.trimmed.integrate<- function( f, lower, upper,
epsilon=1e-100,
> min.width=1e-10, ... ) {
>>> + # trim the region of integration on the right until f(x)>   
epsilon
>>> +
>>> + a<- lower; b<- upper
>>> + while ( (b-a>min.width)&&    (f(b)<epsilon) ) {
b<- (a+b)/2 }
>>> +
>>> + return( integrate(f,a,b,...) ) }
>>>
>>>> right.trimmed.integrate( dnorm, 0, 20000 )  # test
>>> 0.5 with absolute error<    9.2e-05
>>>
>>> This can be adapted to left trim or (left and right) trim,
> abs(f(x)-c)>epsilon,
>>> etc.  Setting the tolerances epsilon and min.width is an issue,
>>> but an explicit discussion of these values could encourage people
to
>>> think about the problem in their specific case.  And of course,
none
>>> of this guarantees a correct answer, especially if someone tries
this
>>> on non-monotonic integrands with complicated 0 sets.  One could
write
>>> a somewhat more user-friendly version where the user has to specify
>>> some property (or set of properties) of the integrand, e.g.
"right-tail
>>> decreasing to 0", etc. and have the algorithm try to do smart
>>> trimming based on this.  But perhaps this getting too involved.
>>>
>>> In the end, there is no general solution because any solution
>>> depends on the specific nature of the integrand.  Clearer messages,
>>> warnings in suspicious cases like subdivisions==1, and a simple
>>> example explaining what the issue is in the help page would help
>>> some people.
>>>
>>> John
>>>
>>>
> ...........................................................................
>>>     John P. Nolan
>>>     Math/Stat Department
>>>     227 Gray Hall
>>>     American University
>>>     4400 Massachusetts Avenue, NW
>>>     Washington, DC 20016-8050
>>>
>>>     jpnolan at american.edu
>>>     202.885.3140 voice
>>>     202.885.3155 fax
>>>     http://academic2.american.edu/~jpnolan
>>>
> ...........................................................................
>>> -----r-devel-bounces at r-project.org wrote: -----
>>> To: r-devel at r-project.org, Prof Brian Ripley<ripley at
stats.ox.ac.uk>
>>> From: Martin Maechler
>>> Sent by: r-devel-bounces at r-project.org
>>> Date: 12/07/2010 03:29AM
>>> Subject: Re: [Rd] 0.5 != integrate(dnorm,0,20000) = 0
>>>
>>>>>>>> Prof Brian Ripley<ripley at
stats.ox.ac.uk>
>>>>>>>>        on Tue, 7 Dec 2010 07:41:16 +0000 (GMT)
writes:
>>>        >    On Mon, 6 Dec 2010, Spencer Graves wrote:
>>>        >>    Hello:
>>>        >>
>>>        >>
>>>        >>    The example "integrate(dnorm,0,20000)"
says it "fails on many
> systems".
>>>        >>    I just got 0 from it, when I should have gotten
either an
> error or something
>>>        >>    close to 0.5.  I got this with R 2.12.0 under
both Windows
> Vista_x64 and
>>>        >>    Linux (Fedora 13);  see the results from Windows
below.  I
> thought you might
>>>        >>    want to know.
>>>
>>>        >    Well, isn't that exactly what the help page says
happens?
> That
>>>        >    example is part of a section entitled
>>>
>>>        >    ## integrate can fail if misused
>>>
>>>        >    and is part of the illustration of
>>>
>>>        >    If the function is
>>>        >    approximately constant (in particular, zero) over
nearly all
> its
>>>        >    range it is possible that the result and error
estimate may be
>>>        >    seriously wrong.
>>>
>>> yes, of course,
>>> and the issue has been known for ``ages''  ..
>>> ..........
>>> ..........
>>> but it seems that too many useRs are not reading the help
>>> page carefully, but only browse it quickly.
>>> I think we (R developers) have to live with this fact
>>> and should consider adapting to it a bit more, particularly in
>>> this case (see below)
>>>
>>>        >>
>>>        >>    Thanks for all your work in creating and
maintaining R.
>>>        >>
>>>        >>
>>>        >>    Best Wishes,
>>>        >>    Spencer Graves
>>>        >>    ###############################
>>>
>>>        >>
>>>        >>    integrate(dnorm,0,20000) ## fails on many
systems
>>>
>>>        >>    0 with absolute error<    0
>>>
>>> and this is particularly unsatisfactory for another reason:
>>>
>>> "absolute error<    0"
>>> is *always* incorrect, so I think we should change *some*thing.
>>>
>>> We could just use "<=" (and probably should in any
case, or
>>> "<    ~= x" which would convey ``is less than about
x'' which I
>>> think is all we can say),
>>> but could consider giving a different message when the integral
>>> evaluates to 0 or, rather actually,
>>> only when the error bound ('abs.error') is 0 *and*
'subdivisions == 1'
>>> as the latter indicates that the algorithm treated the integrand
>>> f(.) as if f() was a linear function.
>>>
>>> But in my quick experiments, even for linear (incl. constant)
>>> functions, the 'abs.error' returned is never 0.
>>>
>>> If we want to be cautious,
>>> such a warning could be made explicitly suppressable by an argument
>>>          .warn.if.doubtful = TRUE
>>>
>>> An additional possibility I'd like to try, is a new argument
>>>       'min.subdivisions = 3' which specifies the *minimal*
number
>>> of subdivisions to be used in addition to the already present
>>>       'subdivisions = 100' (= the maximum number of
subintervals.)
>>>
>>> Martin Maechler,
>>> ETH Zurich
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

-- 
Spencer Graves, PE, PhD
President and Chief Operating Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San Jos?, CA 95126
ph:  408-655-4567

Reasonably Related Threads

Search for more reasonably related threads

R devel - Dec 2010 - 0.5 != integrate(dnorm,0,20000) = 0

[Rd] 0.5 != integrate(dnorm,0,20000) = 0

[Rd] 0.5 != integrate(dnorm,0,20000) = 0

[Rd] 0.5 != integrate(dnorm,0,20000) = 0

[Rd] Suggested change to integrate.Rd

Reasonably Related Threads