Hi there,
Here is a minimum working example:
----------------------------------------------------------------
lower = 0
upper = 1
n_bins = 50
interval = (upper - lower) / n_bins
bins = vector(mode="numeric", length=n_bins)
breaks = seq(from=lower + interval, to=upper, by=interval)
for(idx in breaks)
{
bins[idx / interval] = idx
}
print(bins)
----------------------------------------------------------------
which outputs:
----------------------------------------------------------------
[1] 0.02 0.04 0.06 0.08 0.10 0.14 0.00 0.16 0.20 0.00 0.22 0.24 0.26 0.28
[15] 0.30 0.32 0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.48 0.50 0.52 0.54 0.56
[29] 0.58 0.60 0.62 0.64 0.66 0.68 0.70 0.72 0.74 0.76 0.78 0.80 0.82 0.84
[43] 0.86 0.88 0.90 0.92 0.94 0.96 0.98 1.00
----------------------------------------------------------------
It turns out that some elements are incorrect, such as the 6th
element 0.14, which should be 0.12 in fact.
Is this a bug or I am missing something?
And here is the output of sessionInfo():
----------------------------------------------------------------
R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] cubature_1.1-1 tools_2.15.0
----------------------------------------------------------------
Thanks in advance.
Regards,
Guo
[[alternative HTML version deleted]]
On 06-10-2012, at 08:14, ?? <guo.chow at gmail.com> wrote:> Hi there, > Here is a minimum working example: > ---------------------------------------------------------------- > lower = 0 > upper = 1 > n_bins = 50 > interval = (upper - lower) / n_bins > bins = vector(mode="numeric", length=n_bins) > breaks = seq(from=lower + interval, to=upper, by=interval) > > for(idx in breaks) > { > bins[idx / interval] = idx > } > > print(bins) > ---------------------------------------------------------------- > which outputs: > ---------------------------------------------------------------- > [1] 0.02 0.04 0.06 0.08 0.10 0.14 0.00 0.16 0.20 0.00 0.22 0.24 0.26 0.28 > [15] 0.30 0.32 0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.48 0.50 0.52 0.54 0.56 > [29] 0.58 0.60 0.62 0.64 0.66 0.68 0.70 0.72 0.74 0.76 0.78 0.80 0.82 0.84 > [43] 0.86 0.88 0.90 0.92 0.94 0.96 0.98 1.00 > ---------------------------------------------------------------- > It turns out that some elements are incorrect, such as the 6th > element 0.14, which should be 0.12 in fact.And the 7th is also incorrect.> Is this a bug or I am missing something?It is not a bug in R. Yes you are indeed missing something. Read R FAQ 7.31. Answer is: floating point inaccuracy. Insert print(formatC(idx/interval,format="f",digits=17)) print(as.integer(idx/interval)) immediately after the opening { of the for loop. If you insist on copying breaks to bins in the way you are doing you could use round(idx/interval,3) for example. Berend
R. Michael Weylandt
2012-Oct-06 10:29 UTC
[R] vector is not assigned correctly in for loop
Forgot to cc the list. RMW On Sat, Oct 6, 2012 at 11:29 AM, R. Michael Weylandt <michael.weylandt at gmail.com> wrote:> A case study of a good question! Would that all posters did such a good job. > > > > On Sat, Oct 6, 2012 at 7:14 AM, ?? <guo.chow at gmail.com> wrote: >> Hi there, >> Here is a minimum working example: >> ---------------------------------------------------------------- >> lower = 0 >> upper = 1 >> n_bins = 50 >> interval = (upper - lower) / n_bins >> bins = vector(mode="numeric", length=n_bins) >> breaks = seq(from=lower + interval, to=upper, by=interval) >> >> for(idx in breaks) >> { >> bins[idx / interval] = idx >> } >> > > Note that this could slightly move idiomatically be done as > > bins[breaks / interval] = breaks > >> print(bins) >> ---------------------------------------------------------------- >> which outputs: >> ---------------------------------------------------------------- >> [1] 0.02 0.04 0.06 0.08 0.10 0.14 0.00 0.16 0.20 0.00 0.22 0.24 0.26 0.28 >> [15] 0.30 0.32 0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.48 0.50 0.52 0.54 0.56 >> [29] 0.58 0.60 0.62 0.64 0.66 0.68 0.70 0.72 0.74 0.76 0.78 0.80 0.82 0.84 >> [43] 0.86 0.88 0.90 0.92 0.94 0.96 0.98 1.00 >> ---------------------------------------------------------------- >> It turns out that some elements are incorrect, such as the 6th >> element 0.14, which should be 0.12 in fact. >> Is this a bug or I am missing something? > > Take a look at > > as.integer(breaks / interval) > > You're hitting up on floating-point issues (see the link in R FAQ 7.31 > for the definitive reference, but it's a large and complicated field > with many little manifestations like this) > > What's basically happening is that the 7 you see in breaks / interval, > is actually 6.999999999999 (or so) which gets printed as a 7 by > print() but truncated to a 6 for subsetting as mentioned in ?`[`. If > you were to turn on more digits for printing, you'd see it's not > really a 7. > > You'd probably rather have > > bins[round(breaks / interval)] = breaks > > Cheers and thanks again for spending so much time to make a good question, > > Michael > >> And here is the output of sessionInfo(): >> ---------------------------------------------------------------- >> R version 2.15.0 (2012-03-30) >> Platform: x86_64-pc-mingw32/x64 (64-bit) >> >> locale: >> [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 >> [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936 >> [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936 >> [4] LC_NUMERIC=C >> [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> loaded via a namespace (and not attached): >> [1] cubature_1.1-1 tools_2.15.0 >> ---------------------------------------------------------------- >> Thanks in advance. >> >> Regards, >> Guo >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.
Hello,
This seems to be a case for FAQ 7.31 Why doesn't R think these numbers
are equal?
See this example:
3/5 - 1/5 - 2/5 # not zero
3/5 - (1/5 + 2/5) # not zero, different from above
In your case, try
for(idx in breaks){
print(idx / interval, digits = 16) # see problem indices
bins[idx / interval] = idx
}
b2 <- breaks
identical(bins, b2) # FALSE
What happens is that instead of 7, the value of idx/interval is
6.9999999 with integer part 6. So bins[6] is assigned twice, first 1.2
then this valuew is overwritten by 1.4 and bins[7] is never written to.
The same goes with indices 9 and 10.
Avoid this type of indexing. And if possible use the vectorized
instruction b2 <- breaks.
Hope this helps,
Rui Barradas
Em 06-10-2012 07:14, ?? escreveu:> Hi there,
> Here is a minimum working example:
> ----------------------------------------------------------------
> lower = 0
> upper = 1
> n_bins = 50
> interval = (upper - lower) / n_bins
> bins = vector(mode="numeric", length=n_bins)
> breaks = seq(from=lower + interval, to=upper, by=interval)
>
> for(idx in breaks)
> {
> bins[idx / interval] = idx
> }
>
> print(bins)
> ----------------------------------------------------------------
> which outputs:
> ----------------------------------------------------------------
> [1] 0.02 0.04 0.06 0.08 0.10 0.14 0.00 0.16 0.20 0.00 0.22 0.24 0.26 0.28
> [15] 0.30 0.32 0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.48 0.50 0.52 0.54 0.56
> [29] 0.58 0.60 0.62 0.64 0.66 0.68 0.70 0.72 0.74 0.76 0.78 0.80 0.82 0.84
> [43] 0.86 0.88 0.90 0.92 0.94 0.96 0.98 1.00
> ----------------------------------------------------------------
> It turns out that some elements are incorrect, such as the 6th
> element 0.14, which should be 0.12 in fact.
> Is this a bug or I am missing something?
> And here is the output of sessionInfo():
> ----------------------------------------------------------------
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-pc-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936
> [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936
> [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
> [4] LC_NUMERIC=C
> [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> loaded via a namespace (and not attached):
> [1] cubature_1.1-1 tools_2.15.0
> ----------------------------------------------------------------
> Thanks in advance.
>
> Regards,
> Guo
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.