Gerrit Eichner
2013-Oct-15 11:53 UTC
[R] bug(?) in str() with strict.width = "cut" when applied to dataframe with numeric component AND factor or character component with longerlevels/strings
Dear list subscribers, here is a small artificial example to demonstrate the problem that I encountered when looking at the structure of a (larger) data frame that comprised (among other components) a numeric component of elements of the order of > 10000, and a factor or character component with longer levels/strings: k <- 43 # length of levels or character strings n <- 11 # number of rows of data frame M <- 10000 # order of magnitude of numerical values set.seed( 47) # to reproduce the following artificial character string longer.char.string <- paste( sample( letters, k, replace = TRUE), collapse = "") X <- data.frame( A = 1:n * M, B = rep( longer.char.string, n)) The following call to str() gives apparently a wrong result str( X, strict.width = "cut") 'data.frame': 11 obs. of 2 variables: $ A: num 1e+04 2e+04 3e+04 4e+04 5e+04 6e+04 7e+04 8e+04 9e+04 1e+.. $ A: num 1e+04 2e+04 3e+04 4e+04 5e+04 6e+04 7e+04 8e+04 9e+04 1e+.. whereas the correct result appears for str( X) or if you decrease k to 42 (isn't that "the answer"? ;-) ) or n to 10 or M to 1000 (or smaller, respectively). I tried to dig into the entrails of str.default(), where the cause may lie, but got lost pretty soon. So, I am hoping that someone may already have a work-around or patch (or dares to dig further)? Thank you for any feedback! Best regards -- Gerrit PS:> sessionInfo()R version 3.0.2 (2013-09-25) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C [5] LC_TIME=German_Germany.1252 attached base packages: [1] splines stats graphics grDevices utils datasets [7] methods base other attached packages: [1] nparcomp_2.0 multcomp_1.2-21 mvtnorm_0.9-9996 [4] car_2.0-19 Hmisc_3.12-2 Formula_1.1-1 [7] survival_2.37-4 fortunes_1.5-0 loaded via a namespace (and not attached): [1] cluster_1.14.4 grid_3.0.2 lattice_0.20-23 MASS_7.3-29 [5] nnet_7.3-7 rpart_4.1-3 stats4_3.0.2 tools_3.0.2 --------------------------------------------------------------------- Dr. Gerrit Eichner Mathematical Institute, Room 212 gerrit.eichner at math.uni-giessen.de Justus-Liebig-University Giessen Tel: +49-(0)641-99-32104 Arndtstr. 2, 35392 Giessen, Germany Fax: +49-(0)641-99-32109 http://www.uni-giessen.de/cms/eichner
Duncan Murdoch
2013-Oct-15 20:11 UTC
[R] bug(?) in str() with strict.width = "cut" when applied to dataframe with numeric component AND factor or character component with longerlevels/strings
On 15/10/2013 7:53 AM, Gerrit Eichner wrote:> Dear list subscribers, > > here is a small artificial example to demonstrate the problem that I > encountered when looking at the structure of a (larger) data frame that > comprised (among other components) > > a numeric component of elements of the order of > 10000, and > > a factor or character component with longer levels/strings: > > > k <- 43 # length of levels or character strings > n <- 11 # number of rows of data frame > M <- 10000 # order of magnitude of numerical values > > set.seed( 47) # to reproduce the following artificial character string > longer.char.string <- paste( sample( letters, k, replace = TRUE), > collapse = "") > > X <- data.frame( A = 1:n * M, > B = rep( longer.char.string, n)) > > > The following call to str() gives apparently a wrong result > > str( X, strict.width = "cut") > > 'data.frame': 11 obs. of 2 variables: > $ A: num 1e+04 2e+04 3e+04 4e+04 5e+04 6e+04 7e+04 8e+04 9e+04 1e+.. > $ A: num 1e+04 2e+04 3e+04 4e+04 5e+04 6e+04 7e+04 8e+04 9e+04 1e+.. > > > whereas the correct result appears for str( X) or if you decrease k to 42 > (isn't that "the answer"? ;-) ) or n to 10 or M to 1000 (or smaller, > respectively). > > > I tried to dig into the entrails of str.default(), where the cause may > lie, but got lost pretty soon. So, I am hoping that someone may already > have a work-around or patch (or dares to dig further)? Thank you for any > feedback!I can't reproduce this. I don't have a 64 bit copy of 3.0.2 handy, but I don't see it in 64 bit 3.0.1, or 64 bit 3.0.2-patched, or various 32 bit versions. Is it reproducible for you? It looks to me as though (if it isn't just something weird on your system, e.g. an old copy of str() in your workspace), it might be a memory protection problem: something needed to be duplicated but wasn't. But unless I can see it happen, I can't start to fix it. Duncan Murdoch> > Best regards -- Gerrit > > PS: > > > sessionInfo() > > R version 3.0.2 (2013-09-25) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 > [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C > [5] LC_TIME=German_Germany.1252 > > attached base packages: > [1] splines stats graphics grDevices utils datasets > [7] methods base > > other attached packages: > [1] nparcomp_2.0 multcomp_1.2-21 mvtnorm_0.9-9996 > [4] car_2.0-19 Hmisc_3.12-2 Formula_1.1-1 > [7] survival_2.37-4 fortunes_1.5-0 > > loaded via a namespace (and not attached): > [1] cluster_1.14.4 grid_3.0.2 lattice_0.20-23 MASS_7.3-29 > [5] nnet_7.3-7 rpart_4.1-3 stats4_3.0.2 tools_3.0.2 > > --------------------------------------------------------------------- > Dr. Gerrit Eichner Mathematical Institute, Room 212 > gerrit.eichner at math.uni-giessen.de Justus-Liebig-University Giessen > Tel: +49-(0)641-99-32104 Arndtstr. 2, 35392 Giessen, Germany > Fax: +49-(0)641-99-32109 http://www.uni-giessen.de/cms/eichner > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.