Suharto Anggono Suharto Anggono
2014-Sep-20 10:52 UTC
[R] factor(300000, levels=1:300000) gives NA
In R:> factor(300000, levels=1:300000)[1] <NA> 300000 Levels: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 ... 300000 The NA above is undesirable in my view, because 300000 is in 1:300000. I have just got bitten by it. I have figured out why it happens. The results of 'as.character' are different.> as.character(300000)[1] "3e+05"> as.character((1:300000)[300000])[1] "300000"> sessionInfo()R version 3.1.1 (2014-07-10) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base
I would say having 300000 levels is a bad idea... You should be re-thinking your
analysis.
If you are still convinced that this is necessary, then do it right:
factor(300000L, levels=1:300000)
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live
Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
On September 20, 2014 3:52:15 AM PDT, Suharto Anggono Suharto Anggono
<suharto_anggono at yahoo.com> wrote:>In R:
>
>> factor(300000, levels=1:300000)
>[1] <NA>
>300000 Levels: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
>23 ... 300000
>
>The NA above is undesirable in my view, because 300000 is in 1:300000.
>
>
>I have just got bitten by it.
>
>
>I have figured out why it happens. The results of 'as.character' are
>different.
>
>> as.character(300000)
>[1] "3e+05"
>> as.character((1:300000)[300000])
>[1] "300000"
>
>
>> sessionInfo()
>R version 3.1.1 (2014-07-10)
>Platform: i386-w64-mingw32/i386 (32-bit)
>
>locale:
>[1] LC_COLLATE=English_United States.1252
>[2] LC_CTYPE=English_United States.1252
>[3] LC_MONETARY=English_United States.1252
>[4] LC_NUMERIC=C
>[5] LC_TIME=English_United States.1252
>
>attached base packages:
>[1] stats graphics grDevices utils datasets methods base
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
You can work around this issue by matching the types of the the 'x'
and 'levels' arguments to factor():
> factor(300000, as.numeric(299999:300001)) # both are floating
point ('numeric')
[1] 3e+05
Levels: 299999 3e+05 300001
> factor(as.integer(300000), 299999:300001) # both are integer
[1] 300000
Levels: 299999 300000 300001
If the types do not match you get undesirable results
> factor(300000, 299999:300001) # x is numeric, levels is integer
[1] <NA>
Levels: 299999 300000 300001
> factor(300000L, as.numeric(299999:300001)) # x is integer, levels is
numeric
[1] <NA>
Levels: 299999 3e+05 300001
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Sat, Sep 20, 2014 at 3:52 AM, Suharto Anggono Suharto Anggono
<suharto_anggono at yahoo.com> wrote:> In R:
>
>> factor(300000, levels=1:300000)
> [1] <NA>
> 300000 Levels: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
... 300000
>
> The NA above is undesirable in my view, because 300000 is in 1:300000.
>
>
> I have just got bitten by it.
>
>
> I have figured out why it happens. The results of 'as.character'
are different.
>
>> as.character(300000)
> [1] "3e+05"
>> as.character((1:300000)[300000])
> [1] "300000"
>
>
>> sessionInfo()
> R version 3.1.1 (2014-07-10)
> Platform: i386-w64-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252
> [2] LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.