Hi there, Here is a minimum working example: ---------------------------------------------------------------- lower = 0 upper = 1 n_bins = 50 interval = (upper - lower) / n_bins bins = vector(mode="numeric", length=n_bins) breaks = seq(from=lower + interval, to=upper, by=interval) for(idx in breaks) { bins[idx / interval] = idx } print(bins) ---------------------------------------------------------------- which outputs: ---------------------------------------------------------------- [1] 0.02 0.04 0.06 0.08 0.10 0.14 0.00 0.16 0.20 0.00 0.22 0.24 0.26 0.28 [15] 0.30 0.32 0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.48 0.50 0.52 0.54 0.56 [29] 0.58 0.60 0.62 0.64 0.66 0.68 0.70 0.72 0.74 0.76 0.78 0.80 0.82 0.84 [43] 0.86 0.88 0.90 0.92 0.94 0.96 0.98 1.00 ---------------------------------------------------------------- It turns out that some elements are incorrect, such as the 6th element 0.14, which should be 0.12 in fact. Is this a bug or I am missing something? And here is the output of sessionInfo(): ---------------------------------------------------------------- R version 2.15.0 (2012-03-30) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936 [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936 [4] LC_NUMERIC=C [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] cubature_1.1-1 tools_2.15.0 ---------------------------------------------------------------- Thanks in advance. Regards, Guo [[alternative HTML version deleted]]
On 06-10-2012, at 08:14, ?? <guo.chow at gmail.com> wrote:> Hi there, > Here is a minimum working example: > ---------------------------------------------------------------- > lower = 0 > upper = 1 > n_bins = 50 > interval = (upper - lower) / n_bins > bins = vector(mode="numeric", length=n_bins) > breaks = seq(from=lower + interval, to=upper, by=interval) > > for(idx in breaks) > { > bins[idx / interval] = idx > } > > print(bins) > ---------------------------------------------------------------- > which outputs: > ---------------------------------------------------------------- > [1] 0.02 0.04 0.06 0.08 0.10 0.14 0.00 0.16 0.20 0.00 0.22 0.24 0.26 0.28 > [15] 0.30 0.32 0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.48 0.50 0.52 0.54 0.56 > [29] 0.58 0.60 0.62 0.64 0.66 0.68 0.70 0.72 0.74 0.76 0.78 0.80 0.82 0.84 > [43] 0.86 0.88 0.90 0.92 0.94 0.96 0.98 1.00 > ---------------------------------------------------------------- > It turns out that some elements are incorrect, such as the 6th > element 0.14, which should be 0.12 in fact.And the 7th is also incorrect.> Is this a bug or I am missing something?It is not a bug in R. Yes you are indeed missing something. Read R FAQ 7.31. Answer is: floating point inaccuracy. Insert print(formatC(idx/interval,format="f",digits=17)) print(as.integer(idx/interval)) immediately after the opening { of the for loop. If you insist on copying breaks to bins in the way you are doing you could use round(idx/interval,3) for example. Berend
R. Michael Weylandt
2012-Oct-06 10:29 UTC
[R] vector is not assigned correctly in for loop
Forgot to cc the list. RMW On Sat, Oct 6, 2012 at 11:29 AM, R. Michael Weylandt <michael.weylandt at gmail.com> wrote:> A case study of a good question! Would that all posters did such a good job. > > > > On Sat, Oct 6, 2012 at 7:14 AM, ?? <guo.chow at gmail.com> wrote: >> Hi there, >> Here is a minimum working example: >> ---------------------------------------------------------------- >> lower = 0 >> upper = 1 >> n_bins = 50 >> interval = (upper - lower) / n_bins >> bins = vector(mode="numeric", length=n_bins) >> breaks = seq(from=lower + interval, to=upper, by=interval) >> >> for(idx in breaks) >> { >> bins[idx / interval] = idx >> } >> > > Note that this could slightly move idiomatically be done as > > bins[breaks / interval] = breaks > >> print(bins) >> ---------------------------------------------------------------- >> which outputs: >> ---------------------------------------------------------------- >> [1] 0.02 0.04 0.06 0.08 0.10 0.14 0.00 0.16 0.20 0.00 0.22 0.24 0.26 0.28 >> [15] 0.30 0.32 0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.48 0.50 0.52 0.54 0.56 >> [29] 0.58 0.60 0.62 0.64 0.66 0.68 0.70 0.72 0.74 0.76 0.78 0.80 0.82 0.84 >> [43] 0.86 0.88 0.90 0.92 0.94 0.96 0.98 1.00 >> ---------------------------------------------------------------- >> It turns out that some elements are incorrect, such as the 6th >> element 0.14, which should be 0.12 in fact. >> Is this a bug or I am missing something? > > Take a look at > > as.integer(breaks / interval) > > You're hitting up on floating-point issues (see the link in R FAQ 7.31 > for the definitive reference, but it's a large and complicated field > with many little manifestations like this) > > What's basically happening is that the 7 you see in breaks / interval, > is actually 6.999999999999 (or so) which gets printed as a 7 by > print() but truncated to a 6 for subsetting as mentioned in ?`[`. If > you were to turn on more digits for printing, you'd see it's not > really a 7. > > You'd probably rather have > > bins[round(breaks / interval)] = breaks > > Cheers and thanks again for spending so much time to make a good question, > > Michael > >> And here is the output of sessionInfo(): >> ---------------------------------------------------------------- >> R version 2.15.0 (2012-03-30) >> Platform: x86_64-pc-mingw32/x64 (64-bit) >> >> locale: >> [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 >> [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936 >> [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936 >> [4] LC_NUMERIC=C >> [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> loaded via a namespace (and not attached): >> [1] cubature_1.1-1 tools_2.15.0 >> ---------------------------------------------------------------- >> Thanks in advance. >> >> Regards, >> Guo >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.
Hello, This seems to be a case for FAQ 7.31 Why doesn't R think these numbers are equal? See this example: 3/5 - 1/5 - 2/5 # not zero 3/5 - (1/5 + 2/5) # not zero, different from above In your case, try for(idx in breaks){ print(idx / interval, digits = 16) # see problem indices bins[idx / interval] = idx } b2 <- breaks identical(bins, b2) # FALSE What happens is that instead of 7, the value of idx/interval is 6.9999999 with integer part 6. So bins[6] is assigned twice, first 1.2 then this valuew is overwritten by 1.4 and bins[7] is never written to. The same goes with indices 9 and 10. Avoid this type of indexing. And if possible use the vectorized instruction b2 <- breaks. Hope this helps, Rui Barradas Em 06-10-2012 07:14, ?? escreveu:> Hi there, > Here is a minimum working example: > ---------------------------------------------------------------- > lower = 0 > upper = 1 > n_bins = 50 > interval = (upper - lower) / n_bins > bins = vector(mode="numeric", length=n_bins) > breaks = seq(from=lower + interval, to=upper, by=interval) > > for(idx in breaks) > { > bins[idx / interval] = idx > } > > print(bins) > ---------------------------------------------------------------- > which outputs: > ---------------------------------------------------------------- > [1] 0.02 0.04 0.06 0.08 0.10 0.14 0.00 0.16 0.20 0.00 0.22 0.24 0.26 0.28 > [15] 0.30 0.32 0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.48 0.50 0.52 0.54 0.56 > [29] 0.58 0.60 0.62 0.64 0.66 0.68 0.70 0.72 0.74 0.76 0.78 0.80 0.82 0.84 > [43] 0.86 0.88 0.90 0.92 0.94 0.96 0.98 1.00 > ---------------------------------------------------------------- > It turns out that some elements are incorrect, such as the 6th > element 0.14, which should be 0.12 in fact. > Is this a bug or I am missing something? > And here is the output of sessionInfo(): > ---------------------------------------------------------------- > R version 2.15.0 (2012-03-30) > Platform: x86_64-pc-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 > [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936 > [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936 > [4] LC_NUMERIC=C > [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > loaded via a namespace (and not attached): > [1] cubature_1.1-1 tools_2.15.0 > ---------------------------------------------------------------- > Thanks in advance. > > Regards, > Guo > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.