I think that Stevie Pederson has the right idea, but it is not obvious what the
threshold should be. Example:
> n <- 2428716; sum(rep(1/n,n)) - 1
[1] -3.297362e-14
I assume that equally large errors in the other direction are also possible.
Regards,
Jorgen Harmse.
----------------------------------------------------------------------
Message: 1
Date: Wed, 23 Oct 2024 15:56:00 +1030
From: Stevie Pederson <stephen.pederson.au at gmail.com>
To: r-help at r-project.org
Subject: [R] OSX-specific Bug in randomForest
Message-ID:
<CAGCDhaXOMhreAUx=60twjtGhpRJ1NV5Xf5cdUJ7REBqn6zQ1TA at
mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi,
It appears there is an OSX-specific bug in the function
`randomForest.default()` Going by the source code at
https://github.com/cran/randomForest/blob/master/R/randomForest.default.R
the bug is on line 103
If the vector `cutoff` is formed using `cutoff <- rep(1/9, 9)` (line #101)
the test on line 103 will fail on OSX as the sum is greater than 1 due to
machine precision errors.
sum(rep(1 / 9, 9)) - 1
# [1] 2.220446e-16
This will actually occur for a scenario when the number of factor levels
(nclass) is 9, 11, 18, 20 etc.The problem does not occur on Linux, and I
haven't tested on WIndows.
A suggestion may be to change the opening test
if (sum(cutoff) > 1 || ...)
to
if (sum(cutoff) - 1 > .Machine$double.eps || ...
however, I'm sure there's a more elegant way to do this
Thanks in advance
[[alternative HTML version deleted]]
***********
[[alternative HTML version deleted]]