Clark Kogan
2016-Feb-05 04:13 UTC
[R] Accessing specific data.frame columns within function
Hello, I am trying to write a function that adds a few columns to a data.frame. The function uses the columns in a specific way. For instance, it might take a^2 + c to produce a column d. Or it might do more complex manipulations that I don't think I need to discuss here. I want to keep x as a data.frame when I pass it into the function, as I want to use some data.frame functionality on x. Furthermore, I don't want the names in x to have to be specific. I want to be able to specify which columns the function should treat as "a" and "c". The way I am currently doing it, is that I pass the names of the columns that I want to treat as a and c. f <- function(data,oldnames) { newnames <- c("a","c") ix <- match(oldnames,names(y)) names(y)[ix] <- newnames y <- subset(y,c==4) y$d <- y$a^2 + y$c ix <- match(newnames,names(y)) names(y)[ix] <- oldnames y } y <- data.frame(k=c(1,1,1),l=c(2,2,5),m=c(4,2,4)) f(y,c("k","m")) The way that I am doing it does not seem all that elegent or standard practice. My question is: are there potential problems programming with data.frames in this way, and are their standard practice methods of referencing data.frame names that deal with these problems? Thanks! Clark [[alternative HTML version deleted]]
Ulrik Stervbo
2016-Feb-05 04:59 UTC
[R] Accessing specific data.frame columns within function
Hi Clark, In your function you are using the variable 'y' and not 'data'. If this indeed is your intention, there is no need to pass 'data' to your function, otherwise all 'y's in your function should be 'data'. Does this work for you: f <- function(data, oldnames, subset.val = 4){ data <- data[ data[[ oldnames[2] ]] == subset.val, ] data$d <- data[[ oldnames[1] ]]^2 + data[[ oldnames[2] ]] return(data) } y <- data.frame(k=c(1,1,1),l=c(2,2,5),m=c(4,2,4)) f(data = y, oldnames = c("k","m")) Its probably safer to pass the 'oldnames' as two arguments. Also, if you want to subset your data.frame in the function, you should pass the value or subset before you call the function, along the lines of f <- function(data, oldname.1, oldname.2){ data$d <- data[[ oldname.1 ]]^2 + data[[ oldname.2 ]] return(data) } y <- data.frame(k=c(1,1,1),l=c(2,2,5),m=c(4,2,4)) y <- subset(y, m == 4) f(data = y, oldname.1 = "k", oldname.2 = "m") Hope this helps, Ulrik On Fri, 5 Feb 2016 at 05:14 Clark Kogan <kogan.clark at gmail.com> wrote:> Hello, > > I am trying to write a function that adds a few columns to a data.frame. > The > function uses the columns in a specific way. For instance, it might take > a^2 > + c to produce a column d. Or it might do more complex manipulations that I > don't think I need to discuss here. I want to keep x as a data.frame when I > pass it into the function, as I want to use some data.frame functionality > on > x. > > Furthermore, I don't want the names in x to have to be specific. I want to > be able to specify which columns the function should treat as "a" and "c". > > The way I am currently doing it, is that I pass the names of the columns > that I want to treat as a and c. > > f <- function(data,oldnames) { > newnames <- c("a","c") > ix <- match(oldnames,names(y)) > names(y)[ix] <- newnames > y <- subset(y,c==4) > y$d <- y$a^2 + y$c > ix <- match(newnames,names(y)) > names(y)[ix] <- oldnames > y > } > > y <- data.frame(k=c(1,1,1),l=c(2,2,5),m=c(4,2,4)) > f(y,c("k","m")) > > The way that I am doing it does not seem all that elegent or standard > practice. My question is: are there potential problems programming with > data.frames in this way, and are their standard practice methods of > referencing data.frame names that deal with these problems? > > Thanks! > > Clark > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
You are trying to use shortcuts where shortcuts are not appropriate and having to go a lot longer around than if you did not use the shortcut, see fortune(312). You should really reread the help page: help("[[") and section 6.1 of An Introduction to R. Basically you should be able to do something like: f <- function(data, oldnames) { data <- data[ data[[oldnames[2] ]] == 4, ] data[['d']] <- data[[ oldnames[1] ]]^2 + data[[ oldnames[2] ]] data } Or maybe a little more readable (but not as good a golf score): f <- function(data, oldnames) { aa <- oldnames[1] cc <- oldnames[2] data <- data[ data[[ cc ]] == 4, ] data[['d']] <- data[[ aa ]]^2 + data[[ cc ]] data } I could have used a and c instead of aa and cc, but the doubled letters mean less confusion with the `c` function in R. Also you should read (and heed) the Warning section on the help page for subset (?subset). On Thu, Feb 4, 2016 at 9:13 PM, Clark Kogan <kogan.clark at gmail.com> wrote:> Hello, > > I am trying to write a function that adds a few columns to a data.frame. The > function uses the columns in a specific way. For instance, it might take a^2 > + c to produce a column d. Or it might do more complex manipulations that I > don't think I need to discuss here. I want to keep x as a data.frame when I > pass it into the function, as I want to use some data.frame functionality on > x. > > Furthermore, I don't want the names in x to have to be specific. I want to > be able to specify which columns the function should treat as "a" and "c". > > The way I am currently doing it, is that I pass the names of the columns > that I want to treat as a and c. > > f <- function(data,oldnames) { > newnames <- c("a","c") > ix <- match(oldnames,names(y)) > names(y)[ix] <- newnames > y <- subset(y,c==4) > y$d <- y$a^2 + y$c > ix <- match(newnames,names(y)) > names(y)[ix] <- oldnames > y > } > > y <- data.frame(k=c(1,1,1),l=c(2,2,5),m=c(4,2,4)) > f(y,c("k","m")) > > The way that I am doing it does not seem all that elegent or standard > practice. My question is: are there potential problems programming with > data.frames in this way, and are their standard practice methods of > referencing data.frame names that deal with these problems? > > Thanks! > > Clark > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Gregory (Greg) L. Snow Ph.D. 538280 at gmail.com