Jim Bouldin
2011-Aug-04 21:17 UTC
[R] functions on rows or columns of two (or more) arrays
I realize this should be simple, but even after reading over the several help pages several times, I still cannot decide between the myriad "apply" functions to address it. I simply want to apply a function to all the rows (or columns) of the same index from two (or more) identically sized arrays (or data frames). For example:> a=matrix(1:50,nrow=10) > a2=floor(jitter(a,amount=50)) > a[,1] [,2] [,3] [,4] [,5] [1,] 1 11 21 31 41 [2,] 2 12 22 32 42 [3,] 3 13 23 33 43 [4,] 4 14 24 34 44 [5,] 5 15 25 35 45 [6,] 6 16 26 36 46 [7,] 7 17 27 37 47 [8,] 8 18 28 38 48 [9,] 9 19 29 39 49 [10,] 10 20 30 40 50> a2[,1] [,2] [,3] [,4] [,5] [1,] 31 56 -29 -13 10 [2,] 38 61 71 55 9 [3,] -29 38 47 12 38 [4,] 12 2 43 39 93 [5,] -43 23 -23 62 1 [6,] -13 61 55 11 2 [7,] -42 1 38 12 8 [8,] -13 -6 -18 16 95 [9,] -19 -2 78 33 1 [10,] 20 -16 -11 19 17 if I try the following for example: apply(a,1,function(x) lm(a~a2)) I get 10 identical repeats (except for the list indexer) of the following: [[1]] Call: lm(formula = a ~ a2) Coefficients: [,1] [,2] [,3] [,4] [,5] (Intercept) 8.372135 18.372135 28.372135 38.372135 48.372135 a21 -0.006163 -0.006163 -0.006163 -0.006163 -0.006163 a22 -0.093390 -0.093390 -0.093390 -0.093390 -0.093390 a23 0.009315 0.009315 0.009315 0.009315 0.009315 a24 -0.015143 -0.015143 -0.015143 -0.015143 -0.015143 a25 -0.026761 -0.026761 -0.026761 -0.026761 -0.026761 ...Which is clearly very wrong, in a number of ways. If I try by columns: apply(a,2,function(x) lm(a~a2)) ...I get exactly the same result. So, which is the appropriate apply-type function when two arrays (or d.f.'s?) are involved like this? Or none of them and some other approach (other than looping which I can do but which I assume is not optimal)? Thanks for any help. -- Jim Bouldin, PhD Research Ecologist [[alternative HTML version deleted]]
R. Michael Weylandt
2011-Aug-04 21:29 UTC
[R] functions on rows or columns of two (or more) arrays
I hope someone experience with plyr package comes and helps because this sounds like what it does well, but for your specific example something like this works: A = rbind(a,a2) q = apply(A,2,function(x) {lm(x[1:nrow(a)] ~ x[-(1:nrow(a))])}) but yeah, that's pretty rough so I hope someone can come up with something more elegant. If nothing else, I think that idea can be made to work in most circumstances: put it together, then break it apart inside the function passed to apply. Michael Weylandt On Thu, Aug 4, 2011 at 5:17 PM, Jim Bouldin <bouldinjr@gmail.com> wrote:> I realize this should be simple, but even after reading over the several > help pages several times, I still cannot decide between the myriad "apply" > functions to address it. I simply want to apply a function to all the rows > (or columns) of the same index from two (or more) identically sized arrays > (or data frames). > > For example: > > > a=matrix(1:50,nrow=10) > > a2=floor(jitter(a,amount=50)) > > a > [,1] [,2] [,3] [,4] [,5] > [1,] 1 11 21 31 41 > [2,] 2 12 22 32 42 > [3,] 3 13 23 33 43 > [4,] 4 14 24 34 44 > [5,] 5 15 25 35 45 > [6,] 6 16 26 36 46 > [7,] 7 17 27 37 47 > [8,] 8 18 28 38 48 > [9,] 9 19 29 39 49 > [10,] 10 20 30 40 50 > > a2 > [,1] [,2] [,3] [,4] [,5] > [1,] 31 56 -29 -13 10 > [2,] 38 61 71 55 9 > [3,] -29 38 47 12 38 > [4,] 12 2 43 39 93 > [5,] -43 23 -23 62 1 > [6,] -13 61 55 11 2 > [7,] -42 1 38 12 8 > [8,] -13 -6 -18 16 95 > [9,] -19 -2 78 33 1 > [10,] 20 -16 -11 19 17 > > if I try the following for example: > apply(a,1,function(x) lm(a~a2)) > > I get 10 identical repeats (except for the list indexer) of the following: > > [[1]] > > Call: > lm(formula = a ~ a2) > > Coefficients: > [,1] [,2] [,3] [,4] [,5] > (Intercept) 8.372135 18.372135 28.372135 38.372135 48.372135 > a21 -0.006163 -0.006163 -0.006163 -0.006163 -0.006163 > a22 -0.093390 -0.093390 -0.093390 -0.093390 -0.093390 > a23 0.009315 0.009315 0.009315 0.009315 0.009315 > a24 -0.015143 -0.015143 -0.015143 -0.015143 -0.015143 > a25 -0.026761 -0.026761 -0.026761 -0.026761 -0.026761 > > ...Which is clearly very wrong, in a number of ways. If I try by columns: > apply(a,2,function(x) lm(a~a2)) > ...I get exactly the same result. > > So, which is the appropriate apply-type function when two arrays (or > d.f.'s?) are involved like this? Or none of them and some other approach > (other than looping which I can do but which I assume is not optimal)? > Thanks for any help. > -- > Jim Bouldin, PhD > Research Ecologist > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
The apply function also works with multi-dimensional arrays, I think this is what you want to achieve using a 3d array: aaa <- array(NA, dim = c(2, dim(a))) aaa[1,,] <- a aaa[2,,] <- a2 apply(aaa, 3, function(x)lm(x[1,]~x[2,]))
Dennis Murphy
2011-Aug-05 05:19 UTC
[R] functions on rows or columns of two (or more) arrays
Hi: Here's one approach: a=matrix(1:50,nrow=10) a2=floor(jitter(a,amount=50)) # Write a function to combine the columns of interest # into a data frame and fit a linear model regfn <- function(k) { rdf <- data.frame(x = a[k, ], y = a2[k, ]) lm(y ~ x, data = rdf) } # Use lapply() to run regfn() recursively along # the rows of a and a2: modlist <- lapply(seq_len(nrow(a)), regfn) # I prefer plyr for extraction of output from a list of models. # Here are a few examples: library('plyr') # Extract the R^2 values ldply(modlist, function(m) summary(m)$r.squared) # Extract the residuals laply(modlist, function(m) resid(m)) # Extract the estimated model coefficients ldply(modlist, function(m) coef(m)) # Extract the coefficient summary tables as a list llply(modlist, function(m) summary(m)$coefficients) In the anonymous functions, the argument m refers to an arbitrary lm object, so you can do to it what you would with any given lm object; all you're doing is abstracting the process. HTH, Dennis On Thu, Aug 4, 2011 at 2:17 PM, Jim Bouldin <bouldinjr at gmail.com> wrote:> I realize this should be simple, but even after reading over the several > help pages several times, I still cannot decide between the myriad "apply" > functions to address it. ?I simply want to apply a function to all the rows > (or columns) of the same index from two (or more) identically sized arrays > (or data frames). > > For example: > >> a=matrix(1:50,nrow=10) >> a2=floor(jitter(a,amount=50)) >> a > ? ? ?[,1] [,2] [,3] [,4] [,5] > ?[1,] ? ?1 ? 11 ? 21 ? 31 ? 41 > ?[2,] ? ?2 ? 12 ? 22 ? 32 ? 42 > ?[3,] ? ?3 ? 13 ? 23 ? 33 ? 43 > ?[4,] ? ?4 ? 14 ? 24 ? 34 ? 44 > ?[5,] ? ?5 ? 15 ? 25 ? 35 ? 45 > ?[6,] ? ?6 ? 16 ? 26 ? 36 ? 46 > ?[7,] ? ?7 ? 17 ? 27 ? 37 ? 47 > ?[8,] ? ?8 ? 18 ? 28 ? 38 ? 48 > ?[9,] ? ?9 ? 19 ? 29 ? 39 ? 49 > [10,] ? 10 ? 20 ? 30 ? 40 ? 50 >> a2 > ? ? ?[,1] [,2] [,3] [,4] [,5] > ?[1,] ? 31 56 -29 -13 10 > ?[2,] ? 38 ? 61 ? 71 ? 55 ? ?9 > ?[3,] ?-29 ? 38 ? 47 ? 12 ? 38 > ?[4,] ? 12 ? ?2 ? 43 ? 39 ? 93 > ?[5,] ?-43 ? 23 ?-23 ? 62 ? ?1 > ?[6,] ?-13 ? 61 ? 55 ? 11 ? ?2 > ?[7,] ?-42 ? ?1 ? 38 ? 12 ? ?8 > ?[8,] ?-13 ? -6 ?-18 ? 16 ? 95 > ?[9,] ?-19 ? -2 ? 78 ? 33 ? ?1 > [10,] ? 20 -16 -11 19 17 > > if I try the following for example: > apply(a,1,function(x) lm(a~a2)) > > I get 10 identical repeats (except for the list indexer) of the following: > > [[1]] > > Call: > lm(formula = a ~ a2) > > Coefficients: > ? ? ? ? ? ? [,1] ? ? ? [,2] ? ? ? [,3] ? ? ? [,4] ? ? ? [,5] > (Intercept) ? 8.372135 ?18.372135 ?28.372135 ?38.372135 ?48.372135 > a21 ? ? ? ? ?-0.006163 ?-0.006163 ?-0.006163 ?-0.006163 ?-0.006163 > a22 ? ? ? ? ?-0.093390 ?-0.093390 ?-0.093390 ?-0.093390 ?-0.093390 > a23 ? ? ? ? ? 0.009315 ? 0.009315 ? 0.009315 ? 0.009315 ? 0.009315 > a24 ? ? ? ? ?-0.015143 ?-0.015143 ?-0.015143 ?-0.015143 ?-0.015143 > a25 ? ? ? ? ?-0.026761 ?-0.026761 ?-0.026761 ?-0.026761 ?-0.026761 > > ...Which is clearly very wrong, in a number of ways. ?If I try by columns: > apply(a,2,function(x) lm(a~a2)) > ...I get exactly the same result. > > So, which is the appropriate apply-type function when two arrays (or > d.f.'s?) are involved like this? Or none of them and some other approach > (other than looping which I can do but which I assume is not optimal)? > Thanks for any help. > -- > Jim Bouldin, PhD > Research Ecologist > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >