Dear list! ? I have question of?'correct function formation'. Which function (fun1 or fun2; see below) is written more correctly? Using ''structure'' as output or creating empty ''data.frame'' and then transform it as output? (fun1 and fun1 is just for illustration). ? Thanks a lot, OV ? code: input <- data.frame(x1 = rnorm(20), x2 = rnorm(20), x3 = rnorm(20)) fun1 <- function(x) { ??? ID <- NULL; minimum <- NULL; maximum <- NULL ??? for(i in seq_along(names(x)))?? { ??????? ID[i]?????? <- names(x)[i] ????????? minimum[i]? <- min(x[, names(x)[i]]) ??????????? maximum[i]? <- max(x[, names(x)[i]]) ??????????????????????????????????? } ??? output <- structure(list(ID, minimum, maximum), row.names = seq_along(names(x)), .Names = c("ID", "minimum", "maximum"), class = "data.frame") ??? return(output) } fun2 <- function(x) { ??? output <- data.frame(ID = character(), minimum = numeric(), maximum = numeric(), stringsAsFactors = FALSE) ??? for(i in seq_along(names(x)))?? { ??????? output[i, "ID"] <-names(x)[i] ??????? output[i, "minimum"]? <- min(x[, names(x)[i]]) ??????? output[i, "maximum"]? <- max(x[, names(x)[i]]) ??????????????????????????????????? } ??? return(output) } fun1(input) fun2(input)
Hello, I believe it's a matter of personal taste. I find fun2 more readable, others may not agree. Rui Barradas Em 20-11-2012 17:39, Omphalodes Verna escreveu:> Dear list! > > I have question of 'correct function formation'. Which function (fun1 or fun2; see below) is written more correctly? Using ''structure'' as output or creating empty ''data.frame'' and then transform it as output? (fun1 and fun1 is just for illustration). > > Thanks a lot, OV > > code: > input <- data.frame(x1 = rnorm(20), x2 = rnorm(20), x3 = rnorm(20)) > fun1 <- function(x) { > ID <- NULL; minimum <- NULL; maximum <- NULL > for(i in seq_along(names(x))) { > ID[i] <- names(x)[i] > minimum[i] <- min(x[, names(x)[i]]) > maximum[i] <- max(x[, names(x)[i]]) > } > output <- structure(list(ID, minimum, maximum), row.names = seq_along(names(x)), .Names = c("ID", "minimum", "maximum"), class = "data.frame") > return(output) > } > fun2 <- function(x) { > output <- data.frame(ID = character(), minimum = numeric(), maximum = numeric(), stringsAsFactors = FALSE) > for(i in seq_along(names(x))) { > output[i, "ID"] <-names(x)[i] > output[i, "minimum"] <- min(x[, names(x)[i]]) > output[i, "maximum"] <- max(x[, names(x)[i]]) > } > return(output) > } > > fun1(input) > fun2(input) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On 20/11/2012 12:39 PM, Omphalodes Verna wrote:> Dear list! > > I have question of 'correct function formation'. Which function (fun1 or fun2; see below) is written more correctly? Using ''structure'' as output or creating empty ''data.frame'' and then transform it as output? (fun1 and fun1 is just for illustration). > > Thanks a lot, OV > > code: > input <- data.frame(x1 = rnorm(20), x2 = rnorm(20), x3 = rnorm(20)) > fun1 <- function(x) { > ID <- NULL; minimum <- NULL; maximum <- NULL > for(i in seq_along(names(x))) { > ID[i] <- names(x)[i] > minimum[i] <- min(x[, names(x)[i]]) > maximum[i] <- max(x[, names(x)[i]]) > } > output <- structure(list(ID, minimum, maximum), row.names = seq_along(names(x)), .Names = c("ID", "minimum", "maximum"), class = "data.frame") > return(output) > }fun1 above relies on the internal implementation of the data.frame class. That's really unlikely to change, but you still shouldn't rely on it.> fun2 <- function(x) { > output <- data.frame(ID = character(), minimum = numeric(), maximum = numeric(), stringsAsFactors = FALSE) > for(i in seq_along(names(x))) { > output[i, "ID"] <-names(x)[i] > output[i, "minimum"] <- min(x[, names(x)[i]]) > output[i, "maximum"] <- max(x[, names(x)[i]]) > } > return(output) > }This one is going to be really slow, because it does so much indexing of the output dataframe. I would combine the approaches: assign to local variables in the loop the way fun1 does, then construct a dataframe at the end. That is, output <- data.frame(ID, minimum, maximum) return(output) One other change: don't initialize the local variables to NULL, initialize them to their final size, e.g. ID <- character(ncol(x)) minimum <- numeric(ncol(x)) maximum <- numeric(ncol(x)) (And if the contents are as simple as in the example, you don't need the loop, but I assume the real case is more complicated.) Duncan Murdoch> > fun1(input) > fun2(input) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.