Thaler, Thorn, LAUSANNE, Applied Mathematics
2011-Nov-03 13:48 UTC
[R] Select columns of a data.frame by name OR index in a function
Dear all, Sometimes I have the situation where a function takes a data.frame and an additional argument describing come columns. For greater flexibility I want to allow for either column names or column indices. What I usually do then is something like the following: -------------8<------------- f <- function(datf, cols) { nc <- seq_along(datf) cn <- colnames(datf) colOK <- (cols %in% nc) | (cols %in% cn) if (!all(colOK)) { badc <- paste(sQuote(cols[!colOK]), collapse = ", ") msg <- sprintf(ngettext(sum(!colOK), "%s is not a valid column selector", "%s are not valid column selectors"), badc) stop(msg) } which((nc %in% cols) | (cn %in% cols)) # with this set of indices I would work in the rest of the code } dd <- data.frame(a=1, b=1, c=1) f(dd, 2:3) # [1] 2 3 f(dd, 1:4) # Error in f(dd, 1:4) : '4' is not a valid column selector f(dd, "a") # [1] 1 f(dd, c("a", "d", "e")) # Error in f(dd, c("a", "d", "e")) : 'd', 'e' are not valid column selectors ------------->8------------- So my question is, whether there are smarter/better/easier/more R-like ways of doing that? Any input appreciated. KR, -Thorn
Jean V Adams
2011-Nov-03 21:42 UTC
[R] Select columns of a data.frame by name OR index in a function
Thaler, Thorn, LAUSANNE, Applied Mathematics wrote on 11/03/2011 08:48:26 AM:> > Dear all, > > Sometimes I have the situation where a function takes a data.frame and > an additional argument describing come columns. For greater flexibility > I want to allow for either column names or column indices. What I > usually do then is something like the following: > > -------------8<------------- > f <- function(datf, cols) { > nc <- seq_along(datf) > cn <- colnames(datf) > colOK <- (cols %in% nc) | (cols %in% cn) > if (!all(colOK)) { > badc <- paste(sQuote(cols[!colOK]), collapse = ", ") > msg <- sprintf(ngettext(sum(!colOK), > "%s is not a valid column selector", > "%s are not valid column selectors"), > badc) > stop(msg) > } > which((nc %in% cols) | (cn %in% cols)) # with this set of indices I > would work in the rest of the code > } > > dd <- data.frame(a=1, b=1, c=1) > f(dd, 2:3) # [1] 2 3 > f(dd, 1:4) # Error in f(dd, 1:4) : '4' is not a valid column selector > f(dd, "a") # [1] 1 > f(dd, c("a", "d", "e")) # Error in f(dd, c("a", "d", "e")) : 'd', 'e' > are not valid column selectors > ------------->8------------- > > So my question is, whether there are smarter/better/easier/more R-like > ways of doing that? > > Any input appreciated. > > > KR, > > -Thorn >Since the extract function, [, already works with either column names or indices, you might as well take advantage of this. If you really want the indices for the selected columns, the code below will work. If you just wanted the subsetted data frame, you can just output x, rather than matching the names. f <- function(df, cols) { x <- df[, cols, drop=FALSE] match(names(x), names(df)) } Jean [[alternative HTML version deleted]]
Maybe Matching Threads
- update.default: fall back on model.frame in case that the data frame is not in the parent environment
- ggplot2: Add '+' operator for aes (uneval) objects
- overriding "summary.default" or "summary.data.frame". How?
- Should there be a confint.mlm ?
- install.packages misleads about package availability?