I use the following code to create two data.frames d1 and d2 from a list: types <- c("integer", "character", "double") nlines <- 10 d1 <- as.data.frame(lapply(types, do.call, list(nlines)), stringsAsFactor=FALSE) l2 <- lapply(types, do.call, list(nlines)) d2 <- as.data.frame(l2, stringsAsFactors=FALSE) I would expect d1 and d2 to be the same, however, in d1 the second column is a factor while in d2 it is a character (which I would expect):> str(d1)'data.frame': 10 obs. of 3 variables: $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 $ c........................................: Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0> str(d2)'data.frame': 10 obs. of 3 variables: $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 $ c........................................: chr "" "" "" "" ... $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 As different but related question: I use the commands above to create an 'empty' data.frame with specified column types and dimensions. I need this data.frame to pass on to my c++ routines. Is there a more simple/elegant way of creating this data.frame? Regards, Jan PS: I am running R on 64 bit Ubuntu 11.04:> sessionInfo()R version 2.12.1 (2010-12-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base
Forget I asked. There was a typo in my example (stringsAsFactor instead of stringAsFactors) which explained the difference. My apologies. My second question however still stands: How does on create a data.frame with given column types and given dimensions? Thanks. Regards, Jan Quoting Jan van der Laan <rhelp at eoos.dds.nl>:> I use the following code to create two data.frames d1 and d2 from a list: > > types <- c("integer", "character", "double") > nlines <- 10 > d1 <- as.data.frame(lapply(types, do.call, list(nlines)), > stringsAsFactor=FALSE) > l2 <- lapply(types, do.call, list(nlines)) > d2 <- as.data.frame(l2, stringsAsFactors=FALSE) > > I would expect d1 and d2 to be the same, however, in d1 the second > column is a factor while in d2 it is a character (which I would expect): > >> str(d1) > 'data.frame': 10 obs. of 3 variables: > $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 > $ c........................................: Factor w/ 1 level "": 1 1 > 1 1 1 1 1 1 1 1 > $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 >> str(d2) > 'data.frame': 10 obs. of 3 variables: > $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 > $ c........................................: chr "" "" "" "" ... > $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 > > > As different but related question: I use the commands above to create > an 'empty' data.frame with specified column types and dimensions. I > need this data.frame to pass on to my c++ routines. Is there a more > simple/elegant way of creating this data.frame? > > Regards, > > Jan > > > PS: > I am running R on 64 bit Ubuntu 11.04: > >> sessionInfo() > R version 2.12.1 (2010-12-16) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base
In your post, you're missing the final "s" on the stringsAsFactors argument in the d1 assignment. When I typed it correctly, it works as expected. -- Bert On Sun, May 15, 2011 at 4:25 AM, Jan van der Laan <rhelp at eoos.dds.nl> wrote:> I use the following code to create two data.frames d1 and d2 from a list: > types ?<- c("integer", "character", "double") > nlines <- 10 > d1 ? ? <- as.data.frame(lapply(types, do.call, list(nlines)), > stringsAsFactor=FALSE) > l2 ? ? <- lapply(types, do.call, list(nlines)) > d2 ? ? <- as.data.frame(l2, stringsAsFactors=FALSE) > > I would expect d1 and d2 to be the same, however, in d1 the second column is > a factor while in d2 it is a character (which I would expect): > >> str(d1) > > 'data.frame': ? 10 obs. of ?3 variables: > ?$ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int ?0 0 0 0 0 0 0 0 0 0 > ?$ c........................................: Factor w/ 1 level "": 1 1 1 1 > 1 1 1 1 1 1 > ?$ c.0..0..0..0..0..0..0..0..0..0. ? ? ? ? ?: num ?0 0 0 0 0 0 0 0 0 0 >> >> str(d2) > > 'data.frame': ? 10 obs. of ?3 variables: > ?$ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int ?0 0 0 0 0 0 0 0 0 0 > ?$ c........................................: chr ?"" "" "" "" ... > ?$ c.0..0..0..0..0..0..0..0..0..0. ? ? ? ? ?: num ?0 0 0 0 0 0 0 0 0 0 > > > As different but related question: I use the commands above to create an > 'empty' data.frame with specified column types and dimensions. I need this > data.frame to pass on to my c++ routines. Is there a more simple/elegant way > of creating this data.frame? > > Regards, > > Jan > > > PS: > I am running R on 64 bit Ubuntu 11.04: > >> sessionInfo() > > R version 2.12.1 (2010-12-16) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > ?[1] LC_CTYPE=en_US.UTF-8 ? ? ? LC_NUMERIC=C > ?[3] LC_TIME=en_US.UTF-8 ? ? ? ?LC_COLLATE=en_US.UTF-8 > ?[5] LC_MONETARY=C ? ? ? ? ? ? ?LC_MESSAGES=en_US.UTF-8 > ?[7] LC_PAPER=en_US.UTF-8 ? ? ? LC_NAME=C > ?[9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics 467-7374 http://devo.gene.com/groups/devo/depts/ncb/home.shtml