Suharto Anggono Suharto Anggono
2014-Sep-20 10:52 UTC
[R] factor(300000, levels=1:300000) gives NA
In R:> factor(300000, levels=1:300000)[1] <NA> 300000 Levels: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 ... 300000 The NA above is undesirable in my view, because 300000 is in 1:300000. I have just got bitten by it. I have figured out why it happens. The results of 'as.character' are different.> as.character(300000)[1] "3e+05"> as.character((1:300000)[300000])[1] "300000"> sessionInfo()R version 3.1.1 (2014-07-10) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base
I would say having 300000 levels is a bad idea... You should be re-thinking your analysis. If you are still convinced that this is necessary, then do it right: factor(300000L, levels=1:300000) --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. On September 20, 2014 3:52:15 AM PDT, Suharto Anggono Suharto Anggono <suharto_anggono at yahoo.com> wrote:>In R: > >> factor(300000, levels=1:300000) >[1] <NA> >300000 Levels: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 >23 ... 300000 > >The NA above is undesirable in my view, because 300000 is in 1:300000. > > >I have just got bitten by it. > > >I have figured out why it happens. The results of 'as.character' are >different. > >> as.character(300000) >[1] "3e+05" >> as.character((1:300000)[300000]) >[1] "300000" > > >> sessionInfo() >R version 3.1.1 (2014-07-10) >Platform: i386-w64-mingw32/i386 (32-bit) > >locale: >[1] LC_COLLATE=English_United States.1252 >[2] LC_CTYPE=English_United States.1252 >[3] LC_MONETARY=English_United States.1252 >[4] LC_NUMERIC=C >[5] LC_TIME=English_United States.1252 > >attached base packages: >[1] stats graphics grDevices utils datasets methods base > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
You can work around this issue by matching the types of the the 'x' and 'levels' arguments to factor(): > factor(300000, as.numeric(299999:300001)) # both are floating point ('numeric') [1] 3e+05 Levels: 299999 3e+05 300001 > factor(as.integer(300000), 299999:300001) # both are integer [1] 300000 Levels: 299999 300000 300001 If the types do not match you get undesirable results > factor(300000, 299999:300001) # x is numeric, levels is integer [1] <NA> Levels: 299999 300000 300001 > factor(300000L, as.numeric(299999:300001)) # x is integer, levels is numeric [1] <NA> Levels: 299999 3e+05 300001 Bill Dunlap TIBCO Software wdunlap tibco.com On Sat, Sep 20, 2014 at 3:52 AM, Suharto Anggono Suharto Anggono <suharto_anggono at yahoo.com> wrote:> In R: > >> factor(300000, levels=1:300000) > [1] <NA> > 300000 Levels: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 ... 300000 > > The NA above is undesirable in my view, because 300000 is in 1:300000. > > > I have just got bitten by it. > > > I have figured out why it happens. The results of 'as.character' are different. > >> as.character(300000) > [1] "3e+05" >> as.character((1:300000)[300000]) > [1] "300000" > > >> sessionInfo() > R version 3.1.1 (2014-07-10) > Platform: i386-w64-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.