Mathieu Basille
2013-Jul-30 16:01 UTC
[R] 'format' behaviour in a 'apply' call depending on 'options(digits = K)'
Dear list, Here is a simple example in which the behaviour of 'format' does not make sense to me. I have read the documentation and searched the archives, but nothing pointed me in the right direction to understand this behaviour. Let's start with a simple data frame: df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000) Let's now create a new variable 'id2' which is the character representation of 'id'. Note that I use 'scientific = FALSE' to ensure that long numbers such as 100,000 are not formatted using their scientific representation (in this case 1e+05): df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = FALSE)) Let's have a look at part of the result: df1$id2[99990:100010] [1] "99990" "99991" "99992" "99993" "99994" "99995" "99996" [8] "99997" "99998" "99999" "100000" "100001" "100002" "100003" [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010" So far, so good. Let's now play with the 'digits' option: options(digits = 4) df2 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000) df2$id2 <- apply(df2, 1, function(dfi) format(dfi["id"], scientific = FALSE)) df2$id2[99990:100010] [1] "99990" "99991" "99992" "99993" "99994" " 99995" " 99996" [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003" [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010" Notice the extra leading space from 99995 to 99999? To make sure it only happened there: df2$id2[which(df1$id2 != df2$id2)] [1] " 99995" " 99996" " 99997" " 99998" " 99999" And just to make sure it only occurs in a 'apply' call, here is the same directly on a numeric vector: id2 <- format(1:110000, scientific = FALSE) id2[99990:100010] [1] " 99990" " 99991" " 99992" " 99993" " 99994" " 99995" " 99996" [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003" [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010" Here the leading spaces are for every number, which makes sense to me. Is there anything I'm misinterpreting in the behaviour of 'format'? Thanks in advance for any hint, Mathieu. PS: Some background for this question. It all comes from a Rmd document, that knitr consistently failed to process, while the R code was fine using batch or interactive R. knitr uses 'options(digits = 4)' as opposed to 'options(digits = 7)' by default in R, which made one of my function throw an error with knitr, but not with batch or interactive R. I managed to solve the problem using 'trim = TRUE' in 'format', but I still do not understand what's going on... If you're interested, see here for more details on the original problem: http://stackoverflow.com/questions/17866230/knitr-vs-interactive-r-behaviour/17872176 -- ~$ whoami Mathieu Basille, PhD ~$ locate --details University of Florida \\ Fort Lauderdale Research and Education Center (+1) 954-577-6314 http://ase-research.org/basille ~$ fortune ? Le tout est de tout dire, et je manque de mots Et je manque de temps, et je manque d'audace. ? -- Paul ?luard
David Winsemius
2013-Jul-30 17:58 UTC
[R] 'format' behaviour in a 'apply' call depending on 'options(digits = K)'
On Jul 30, 2013, at 9:01 AM, Mathieu Basille wrote:> Dear list, > > Here is a simple example in which the behaviour of 'format' does not make sense to me. I have read the documentation and searched the archives, but nothing pointed me in the right direction to understand this behaviour. Let's start with a simple data frame: > > df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000) > > Let's now create a new variable 'id2' which is the character representation of 'id'. Note that I use 'scientific = FALSE' to ensure that long numbers such as 100,000 are not formatted using their scientific representation (in this case 1e+05): > > df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = FALSE)) > > Let's have a look at part of the result: > > df1$id2[99990:100010] > [1] "99990" "99991" "99992" "99993" "99994" "99995" "99996" > [8] "99997" "99998" "99999" "100000" "100001" "100002" "100003" > [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"Some formating processes are carried out by system functions. In this case I am unable to reproduce with the same code on a Mac OS 10.7.5/R 3.0.1 Patched> df1$id2[99990:100010][1] "99990" "99991" "99992" "99993" "99994" "99995" "99996" "99997" [9] "99998" "99999" "100000" "100001" "100002" "100003" "100004" "100005" [17] "100006" "100007" "100008" "100009" "100010" (I did notice that generation of the id2 variable seemed to take an inordinately long time.) -- David.> > So far, so good. Let's now play with the 'digits' option: > > options(digits = 4) > df2 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000) > df2$id2 <- apply(df2, 1, function(dfi) format(dfi["id"], scientific = FALSE)) > df2$id2[99990:100010] > [1] "99990" "99991" "99992" "99993" "99994" " 99995" " 99996" > [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003" > [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010" > > Notice the extra leading space from 99995 to 99999? To make sure it only happened there: > > df2$id2[which(df1$id2 != df2$id2)] > [1] " 99995" " 99996" " 99997" " 99998" " 99999" > > And just to make sure it only occurs in a 'apply' call, here is the same directly on a numeric vector: > > id2 <- format(1:110000, scientific = FALSE) > id2[99990:100010] > [1] " 99990" " 99991" " 99992" " 99993" " 99994" " 99995" " 99996" > [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003" > [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010" > > Here the leading spaces are for every number, which makes sense to me. Is there anything I'm misinterpreting in the behaviour of 'format'? > Thanks in advance for any hint, > Mathieu. > > > PS: Some background for this question. It all comes from a Rmd document, that knitr consistently failed to process, while the R code was fine using batch or interactive R. knitr uses 'options(digits = 4)' as opposed to 'options(digits = 7)' by default in R, which made one of my function throw an error with knitr, but not with batch or interactive R. I managed to solve the problem using 'trim = TRUE' in 'format', but I still do not understand what's going on... > If you're interested, see here for more details on the original problem: http://stackoverflow.com/questions/17866230/knitr-vs-interactive-r-behaviour/17872176 > > > -- > > ~$ whoami > Mathieu Basille, PhD > > ~$ locate --details > University of Florida \\ > Fort Lauderdale Research and Education Center > (+1) 954-577-6314 > http://ase-research.org/basille > > ~$ fortune > ? Le tout est de tout dire, et je manque de mots > Et je manque de temps, et je manque d'audace. ? > -- Paul ?luard > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA