Kelly Thompson
2022-Apr-09 16:36 UTC
[R] "apply" a function that takes two or more vectors as arguments, such as cor(x, y), over a "category" or "grouping variable" or "index"?
Thanks. I have a clarification and a follow-up question. I should have asked this in the original post, and I should have provided a better example for the FUN argument, I apologize. For use in an example, here is a "silly" example of a function that requires arguments such as x and y to be "separately assigned" : udf_x_plus_y <- function (x, y) { return ( x + y) } Q. Is there a way to use by() when the argument of FUN is a function that requires arguments such as "x" and "y" to be separately assigned (ex. udf_x_plus_y (x = my_x , y = my_y ), rather than assigned as a range of columns using brackets (ex. cor(x)[1,2]) ? Something like this perhaps? (This produces an error message.) by( data = my_df[-1], INDICES = my_df$my_category, FUN = function(x, y) { udf_x_plus_y (x = data$my_x, y = data$my_y) } ) Thanks again. On Sat, Apr 9, 2022 at 5:32 AM Rui Barradas <ruipbarradas at sapo.pt> wrote:> > Hello, > > Another option is ?by. > > > by(my_df[-1], my_df$my_category, cor) > by(my_df[-1], my_df$my_category, \(x) cor(x)[1,2]) > > > Hope this helps, > > Rui Barradas > > ?s 02:26 de 09/04/2022, Kelly Thompson escreveu: > > #Q. How can I "apply" a function that takes two or more vectors as > > arguments, such as cor(x, y), over a "category" or "grouping variable" > > or "index"? > > #I'm using cor() as an example, I'd like to find a way to do this for > > any function that takes 2 or more vectors as arguments. > > > > > > #create example data > > > > my_category <- rep ( c("a","b","c"), 4) > > > > set.seed(12345) > > my_x <- rnorm(12) > > > > set.seed(54321) > > my_y <- rnorm(12) > > > > my_df <- data.frame(my_category, my_x, my_y) > > > > #review data > > my_df > > > > #If i wanted to get the correlation of x and y grouped by category, I > > could use this code and loop: > > > > my_category_unique <- unique(my_category) > > > > my_results <- vector("list", length(my_category_unique) ) > > names(my_results) <- my_category_unique > > > > #start i loop > > for (i in 1:length(my_category_unique) ) { > > my_criteria_i <- my_category == my_category_unique[i] > > my_x_i <- my_x[which(my_criteria_i)] > > my_y_i <- my_y[which(my_criteria_i)] > > my_correl_i <- cor(x = my_x_i, y = my_y_i) > > my_results[i] <- list(my_correl_i) > > } # end i loop > > > > #review results > > my_results > > > > #Q. Is there a better or more "elegant" way to do this, using by(), > > aggregate(), apply(), or some other function? > > > > #This does not work and results in this error message: "Error in > > FUN(dd[x, ], ...) : incompatible dimensions" > > by (data = my_x, INDICES = my_category, FUN = cor, y = my_y) > > > > #This does not work and results in this error message: "Error in > > cor(my_df$x, my_df$y) : ... supply both 'x' and 'y' or a matrix-like > > 'x' " > > by (data = my_df, INDICES = my_category, FUN = function(x, y) { cor > > (my_df$x, my_df$y) } ) > > > > > > #if I wanted the mean of x by category, I could use by() or aggregate(): > > by (data = my_x, INDICES = my_category, FUN = mean) > > > > aggregate(x = my_x, by = list(my_category), FUN = mean) > > > > #Thanks! > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code.
Rui Barradas
2022-Apr-09 16:50 UTC
[R] "apply" a function that takes two or more vectors as arguments, such as cor(x, y), over a "category" or "grouping variable" or "index"?
Hello, Yes, that's possible. Must by() will still pass only one object to the function. Then, in the function, process this object's columns. by(my_df[-1], my_df$my_category, \(x) udf_x_plus_y(x[[1]], x[[2]])) Hope this helps, Rui Barradas ?s 17:36 de 09/04/2022, Kelly Thompson escreveu:> Thanks. I have a clarification and a follow-up question. I should have > asked this in the original post, and I should have provided a better > example for the FUN argument, I apologize. > > For use in an example, here is a "silly" example of a function that > requires arguments such as x and y to be "separately assigned" : > > udf_x_plus_y <- function (x, y) { return ( x + y) } > > Q. Is there a way to use by() when the argument of FUN is a function > that requires arguments such as "x" and "y" to be separately assigned > (ex. udf_x_plus_y (x = my_x , y = my_y ), rather than assigned as a > range of columns using brackets (ex. cor(x)[1,2]) ? > > Something like this perhaps? (This produces an error message.) > by( data = my_df[-1], INDICES = my_df$my_category, FUN = function(x, > y) { udf_x_plus_y (x = data$my_x, y = data$my_y) } ) > > Thanks again. > > On Sat, Apr 9, 2022 at 5:32 AM Rui Barradas <ruipbarradas at sapo.pt> wrote: >> >> Hello, >> >> Another option is ?by. >> >> >> by(my_df[-1], my_df$my_category, cor) >> by(my_df[-1], my_df$my_category, \(x) cor(x)[1,2]) >> >> >> Hope this helps, >> >> Rui Barradas >> >> ?s 02:26 de 09/04/2022, Kelly Thompson escreveu: >>> #Q. How can I "apply" a function that takes two or more vectors as >>> arguments, such as cor(x, y), over a "category" or "grouping variable" >>> or "index"? >>> #I'm using cor() as an example, I'd like to find a way to do this for >>> any function that takes 2 or more vectors as arguments. >>> >>> >>> #create example data >>> >>> my_category <- rep ( c("a","b","c"), 4) >>> >>> set.seed(12345) >>> my_x <- rnorm(12) >>> >>> set.seed(54321) >>> my_y <- rnorm(12) >>> >>> my_df <- data.frame(my_category, my_x, my_y) >>> >>> #review data >>> my_df >>> >>> #If i wanted to get the correlation of x and y grouped by category, I >>> could use this code and loop: >>> >>> my_category_unique <- unique(my_category) >>> >>> my_results <- vector("list", length(my_category_unique) ) >>> names(my_results) <- my_category_unique >>> >>> #start i loop >>> for (i in 1:length(my_category_unique) ) { >>> my_criteria_i <- my_category == my_category_unique[i] >>> my_x_i <- my_x[which(my_criteria_i)] >>> my_y_i <- my_y[which(my_criteria_i)] >>> my_correl_i <- cor(x = my_x_i, y = my_y_i) >>> my_results[i] <- list(my_correl_i) >>> } # end i loop >>> >>> #review results >>> my_results >>> >>> #Q. Is there a better or more "elegant" way to do this, using by(), >>> aggregate(), apply(), or some other function? >>> >>> #This does not work and results in this error message: "Error in >>> FUN(dd[x, ], ...) : incompatible dimensions" >>> by (data = my_x, INDICES = my_category, FUN = cor, y = my_y) >>> >>> #This does not work and results in this error message: "Error in >>> cor(my_df$x, my_df$y) : ... supply both 'x' and 'y' or a matrix-like >>> 'x' " >>> by (data = my_df, INDICES = my_category, FUN = function(x, y) { cor >>> (my_df$x, my_df$y) } ) >>> >>> >>> #if I wanted the mean of x by category, I could use by() or aggregate(): >>> by (data = my_x, INDICES = my_category, FUN = mean) >>> >>> aggregate(x = my_x, by = list(my_category), FUN = mean) >>> >>> #Thanks! >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.