thr3ads.net - R help - [R] "apply" a function that takes two or more vectors as arguments, such as cor(x, y), over a "category" or "grouping variable" or "index"? [Apr 2022]

If this information is useful, please help other people find it:
Share via:

Kelly Thompson

2022-Apr-09 16:36 UTC

[R] "apply" a function that takes two or more vectors as arguments, such as cor(x, y), over a "category" or "grouping variable" or "index"?

Thanks. I have a clarification and a follow-up question. I should have
asked this in the original post, and I should have provided a better
example for the FUN argument, I apologize.

For use in an example, here is a "silly" example of a function that
requires arguments such as x and y to be "separately assigned" :

udf_x_plus_y <- function (x, y) { return ( x + y) }

Q. Is there a way to use by() when the argument of FUN is a function
that requires arguments such as "x" and "y" to be separately
assigned
(ex. udf_x_plus_y (x = my_x , y = my_y ), rather than assigned as a
range of columns using brackets (ex. cor(x)[1,2]) ?

Something like this perhaps? (This produces an error message.)
by( data = my_df[-1], INDICES = my_df$my_category,  FUN = function(x,
y) { udf_x_plus_y (x = data$my_x, y = data$my_y) } )

Thanks again.

On Sat, Apr 9, 2022 at 5:32 AM Rui Barradas <ruipbarradas at sapo.pt>
wrote:>
> Hello,
>
> Another option is ?by.
>
>
> by(my_df[-1], my_df$my_category, cor)
> by(my_df[-1], my_df$my_category, \(x) cor(x)[1,2])
>
>
> Hope this helps,
>
> Rui Barradas
>
> ?s 02:26 de 09/04/2022, Kelly Thompson escreveu:
> > #Q. How can I "apply" a function that takes two or more
vectors as
> > arguments, such as cor(x, y), over a "category" or
"grouping variable"
> > or "index"?
> > #I'm using cor() as an example, I'd like to find a way to do
this for
> > any function that takes 2 or more vectors as arguments.
> >
> >
> > #create example data
> >
> > my_category <- rep ( c("a","b","c"), 
4)
> >
> > set.seed(12345)
> > my_x <- rnorm(12)
> >
> > set.seed(54321)
> > my_y <- rnorm(12)
> >
> > my_df <- data.frame(my_category, my_x, my_y)
> >
> > #review data
> > my_df
> >
> > #If i wanted to get the correlation of x and y grouped by category, I
> > could use this code and loop:
> >
> > my_category_unique <- unique(my_category)
> >
> > my_results <- vector("list", length(my_category_unique) )
> > names(my_results) <- my_category_unique
> >
> > #start i loop
> >    for (i in 1:length(my_category_unique) ) {
> >      my_criteria_i <- my_category == my_category_unique[i]
> >      my_x_i <- my_x[which(my_criteria_i)]
> >      my_y_i <- my_y[which(my_criteria_i)]
> >      my_correl_i <- cor(x = my_x_i, y = my_y_i)
> >      my_results[i] <- list(my_correl_i)
> > } # end i loop
> >
> > #review results
> > my_results
> >
> > #Q. Is there a better or more "elegant" way to do this,
using by(),
> > aggregate(), apply(), or some other function?
> >
> > #This does not work and results in this error message: "Error in
> > FUN(dd[x, ], ...) : incompatible dimensions"
> > by (data = my_x, INDICES = my_category, FUN = cor, y = my_y)
> >
> > #This does not work and results in this error message: "Error in
> > cor(my_df$x, my_df$y) : ... supply both 'x' and 'y' or
a matrix-like
> > 'x' "
> > by (data = my_df, INDICES = my_category, FUN = function(x, y) { cor
> > (my_df$x, my_df$y) } )
> >
> >
> > #if I wanted the mean of x by category, I could use by() or
aggregate():
> > by (data = my_x, INDICES = my_category, FUN = mean)
> >
> > aggregate(x = my_x, by = list(my_category), FUN = mean)
> >
> > #Thanks!
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

Rui Barradas

2022-Apr-09 16:50 UTC

head link

[R] "apply" a function that takes two or more vectors as arguments, such as cor(x, y), over a "category" or "grouping variable" or "index"?

Hello,

Yes, that's possible. Must by() will still pass only one object to the 
function. Then, in the function, process this object's columns.


by(my_df[-1], my_df$my_category, \(x) udf_x_plus_y(x[[1]], x[[2]]))


Hope this helps,

Rui Barradas

?s 17:36 de 09/04/2022, Kelly Thompson escreveu:> Thanks. I have a clarification and a follow-up question. I should have
> asked this in the original post, and I should have provided a better
> example for the FUN argument, I apologize.
> 
> For use in an example, here is a "silly" example of a function
that
> requires arguments such as x and y to be "separately assigned" :
> 
> udf_x_plus_y <- function (x, y) { return ( x + y) }
> 
> Q. Is there a way to use by() when the argument of FUN is a function
> that requires arguments such as "x" and "y" to be
separately assigned
> (ex. udf_x_plus_y (x = my_x , y = my_y ), rather than assigned as a
> range of columns using brackets (ex. cor(x)[1,2]) ?
> 
> Something like this perhaps? (This produces an error message.)
> by( data = my_df[-1], INDICES = my_df$my_category,  FUN = function(x,
> y) { udf_x_plus_y (x = data$my_x, y = data$my_y) } )
> 
> Thanks again.
> 
> On Sat, Apr 9, 2022 at 5:32 AM Rui Barradas <ruipbarradas at sapo.pt>
wrote:
>>
>> Hello,
>>
>> Another option is ?by.
>>
>>
>> by(my_df[-1], my_df$my_category, cor)
>> by(my_df[-1], my_df$my_category, \(x) cor(x)[1,2])
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> ?s 02:26 de 09/04/2022, Kelly Thompson escreveu:
>>> #Q. How can I "apply" a function that takes two or more
vectors as
>>> arguments, such as cor(x, y), over a "category" or
"grouping variable"
>>> or "index"?
>>> #I'm using cor() as an example, I'd like to find a way to
do this for
>>> any function that takes 2 or more vectors as arguments.
>>>
>>>
>>> #create example data
>>>
>>> my_category <- rep (
c("a","b","c"),  4)
>>>
>>> set.seed(12345)
>>> my_x <- rnorm(12)
>>>
>>> set.seed(54321)
>>> my_y <- rnorm(12)
>>>
>>> my_df <- data.frame(my_category, my_x, my_y)
>>>
>>> #review data
>>> my_df
>>>
>>> #If i wanted to get the correlation of x and y grouped by category,
I
>>> could use this code and loop:
>>>
>>> my_category_unique <- unique(my_category)
>>>
>>> my_results <- vector("list",
length(my_category_unique) )
>>> names(my_results) <- my_category_unique
>>>
>>> #start i loop
>>>     for (i in 1:length(my_category_unique) ) {
>>>       my_criteria_i <- my_category == my_category_unique[i]
>>>       my_x_i <- my_x[which(my_criteria_i)]
>>>       my_y_i <- my_y[which(my_criteria_i)]
>>>       my_correl_i <- cor(x = my_x_i, y = my_y_i)
>>>       my_results[i] <- list(my_correl_i)
>>> } # end i loop
>>>
>>> #review results
>>> my_results
>>>
>>> #Q. Is there a better or more "elegant" way to do this,
using by(),
>>> aggregate(), apply(), or some other function?
>>>
>>> #This does not work and results in this error message: "Error
in
>>> FUN(dd[x, ], ...) : incompatible dimensions"
>>> by (data = my_x, INDICES = my_category, FUN = cor, y = my_y)
>>>
>>> #This does not work and results in this error message: "Error
in
>>> cor(my_df$x, my_df$y) : ... supply both 'x' and 'y'
or a matrix-like
>>> 'x' "
>>> by (data = my_df, INDICES = my_category, FUN = function(x, y) { cor
>>> (my_df$x, my_df$y) } )
>>>
>>>
>>> #if I wanted the mean of x by category, I could use by() or
aggregate():
>>> by (data = my_x, INDICES = my_category, FUN = mean)
>>>
>>> aggregate(x = my_x, by = list(my_category), FUN = mean)
>>>
>>> #Thanks!
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Apr 2022 - "apply" a function that takes two or more vectors as arguments, such as cor(x, y), over a "category" or "grouping variable" or "index"?

[R] "apply" a function that takes two or more vectors as arguments, such as cor(x, y), over a "category" or "grouping variable" or "index"?

[R] "apply" a function that takes two or more vectors as arguments, such as cor(x, y), over a "category" or "grouping variable" or "index"?