Christopher T. Moore
2011-Jun-29 18:29 UTC
[R] Unexpected R Behavior: Adding 4 to Large Numbers/IDs Containing Current Year
Hello, I have encountered some unexpected behavior in R that seems to occur as a result of having the current year embedded in a number:> ######################################## > > #Some large numbers, representing IDs. > IDs <- c(41255689815201100, 41255699815201100, 41255709815201100) > > #In scientific notation > IDs[1] 4.125569e+16 4.125570e+16 4.125571e+16> > #Change penalty. > options(scipen = 5) > > #Why does R add 4? > IDs[1] 41255689815201104 41255699815201104 41255709815201104> > #Changing from numeric to character makes no difference. > as.character(IDs)[1] "41255689815201104" "41255699815201104" "41255709815201104"> > #What happens if I treat the numbers as characters?IDs.character <- c("41255689815201100", "41255699815201100", "41255709815201100")> > #No change. > IDs.character[1] "41255689815201100" "41255699815201100" "41255709815201100"> > #R adds 4 upon converting to numeric. > as.numeric(IDs.character)[1] 41255689815201104 41255699815201104 41255709815201104>#Is this problem occurring because the current year is embedded in the number?> IDs <- c(41255689815201100, 41255699815201000, 41255709815201200) > > #R is no longer adding 4 to the numbers without "2011". > IDs[1] 41255689815201104 41255699815201000 41255709815201200> > ########################################Am I doing something wrong? Any insight on how I can avoid the problem of R changing numbers on its own? Are others able to replicate this example? Is this some kind of bug? Am I right that this problem is occurring because the current year is embedded in the number? I discovered this when trying to merge two data sets, one with IDs stored numbers and one with IDs as characters. I have replicated this in Windows XP with R 2.12 and Windows 7 with R 2.13 (both 32- and 64-bit versions). Thanks, Chris -- Christopher T. Moore, M.P.P. Doctoral Student Quantitative Methods in Education University of Minnesota 44.9785?N, 93.2396?W moor0554 at umn.edu http://umn.edu/~moor0554
Peter Langfelder
2011-Jun-29 18:41 UTC
[R] Unexpected R Behavior: Adding 4 to Large Numbers/IDs Containing Current Year
You seem to be running into the limits of double-precision - your IDs have 17 "significant" digits which is more than the double precision floating point number can hold without any rounding errors. Since you are using these numbers as IDs, simply keep them as character strings throughout your code, and nothing will ever change. Or shorten the IDs by a few digits and your IDs will be safe again. HTH, Peter On Wed, Jun 29, 2011 at 11:29 AM, Christopher T. Moore <moor0554 at umn.edu> wrote:> Hello, > > I have encountered some unexpected behavior in R that seems to occur as a > result of having the current year embedded in a number:
David Winsemius
2011-Jun-29 18:42 UTC
[R] Unexpected R Behavior: Adding 4 to Large Numbers/IDs Containing Current Year
On Jun 29, 2011, at 2:29 PM, Christopher T. Moore wrote:> Hello, > > I have encountered some unexpected behavior in R that seems to occur > as a result of having the current year embedded in a number:No. that is not the explanation.> >> ######################################## >> #Some large numbers, representing IDs. >> IDs <- c(41255689815201100, 41255699815201100, 41255709815201100)41255689815201100 > 2*10^9 [1] TRUE So you may think you are working with integers but youa re in fact working with floating point numbers. See the R-FAQ -- David.>> #In scientific notation >> IDs > [1] 4.125569e+16 4.125570e+16 4.125571e+16 >> #Change penalty. >> options(scipen = 5) >> #Why does R add 4? >> IDs > [1] 41255689815201104 41255699815201104 41255709815201104 >> #Changing from numeric to character makes no difference. >> as.character(IDs) > [1] "41255689815201104" "41255699815201104" "41255709815201104" >> #What happens if I treat the numbers as characters? > IDs.character <- c("41255689815201100", "41255699815201100", > "41255709815201100") >> #No change. >> IDs.character > [1] "41255689815201100" "41255699815201100" "41255709815201100" >> #R adds 4 upon converting to numeric. >> as.numeric(IDs.character) > [1] 41255689815201104 41255699815201104 41255709815201104 > #Is this problem occurring because the current year is embedded in > the number? >> IDs <- c(41255689815201100, 41255699815201000, 41255709815201200) >> #R is no longer adding 4 to the numbers without "2011". >> IDs > [1] 41255689815201104 41255699815201000 41255709815201200 >> ######################################## > > Am I doing something wrong? Any insight on how I can avoid the > problem of R changing numbers on its own? Are others able to > replicate this example? Is this some kind of bug? Am I right that > this problem is occurring because the current year is embedded in > the number? I discovered this when trying to merge two data sets, > one with IDs stored numbers and one with IDs as characters. I have > replicated this in Windows XP with R 2.12 and Windows 7 with R 2.13 > (both 32- and 64-bit versions). > > Thanks, > Chris > > -- > Christopher T. Moore, M.P.P. > Doctoral Student > Quantitative Methods in Education > University of Minnesota > 44.9785?N, 93.2396?W > moor0554 at umn.edu > http://umn.edu/~moor0554 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT