Murali Menon
2007-Feb-13 17:27 UTC
[R] Computing stats on common parts of multiple dataframes
Folks, I have three dataframes storing some information about two currency pairs, as follows: R> a EUR-USD NOK-SEK 1.23 1.33 1.22 1.43 1.26 1.42 1.24 1.50 1.21 1.36 1.26 1.60 1.29 1.44 1.25 1.36 1.27 1.39 1.23 1.48 1.22 1.26 1.24 1.29 1.27 1.57 1.21 1.55 1.23 1.35 1.25 1.41 1.25 1.30 1.23 1.11 1.28 1.37 1.27 1.23 R> b EUR-USD NOK-SEK 1.23 1.22 1.21 1.36 1.28 1.61 1.23 1.34 1.21 1.22 R> d EUR-USD NOK-SEK 1.27 1.39 1.23 1.48 1.22 1.26 1.24 1.29 1.27 1.57 1.21 1.55 1.23 1.35 1.25 1.41 1.25 1.33 1.23 1.11 1.28 1.37 1.27 1.23 The twist is that these entries correspond to dates where the *last* rows in each frame are today's entries, and so on backwards in time. I would like to create a matrix of medians (a median for each row and for each currency pair), but only for those rows where all dataframes have entries. My answer in this case should look like: EUR-USD NOK-SEK 1.25 1.41 1.25 1.33 1.23 1.11 1.28 1.37 1.27 1.23 where the last EUR-USD entry = median(1.27, 1.21, 1.27), etc. Notice that the output is of the same dimensions as the smallest dataframe (in this case 'b'). I can do it in a clumsy fashion by first obtaining the number of rows in the smallest matrix, chopping off the top rows of the other matrices to reduce them this size, then doing a for-loop across each currency pair, row-wise, to create a 3-vector which I then apply median() on. Surely there's a better way to do this? Please advise. Thanks, Murali Menon _________________________________________________________________ Valentine?s Day -- Shop for gifts that spell L-O-V-E at MSN Shopping
Erik Iverson
2007-Feb-13 20:42 UTC
[R] Computing stats on common parts of multiple dataframes
Murali - I've come up with something that might with work, with gratutious use of the *apply functions. See ?apply, ?lappy, and ?mapply for how this would work. Basically, just set my.list equal to a list of data.frames you would like included. I made this to work with matrices first, so it does use as.matrix() in my function. Also, this could be turned into a general function so that you could specify a different function other than "median". #Make my.list equal to a list of dataframes you want my.list <- list(df1,df2) #What's the shortest? minrow <- min(sapply(my.list,nrow)) #Chop all to the shortest tmp <- lapply(my.list, function(x) x[(nrow(x)-(minrow-1)):nrow(x),]) #Do the computation, could change median to mean, or a user defined #function matrix(apply(mapply("[",lapply(tmp,as.matrix), MoreArgs=list(1:(minrow*2))), 1, median), ncol=2) HTH, whether or not this is any "better" than your for loop solution is left up to you. Erik Murali Menon wrote:> Folks, > > I have three dataframes storing some information about > two currency pairs, as follows: > > R> a > > EUR-USD NOK-SEK > 1.23 1.33 > 1.22 1.43 > 1.26 1.42 > 1.24 1.50 > 1.21 1.36 > 1.26 1.60 > 1.29 1.44 > 1.25 1.36 > 1.27 1.39 > 1.23 1.48 > 1.22 1.26 > 1.24 1.29 > 1.27 1.57 > 1.21 1.55 > 1.23 1.35 > 1.25 1.41 > 1.25 1.30 > 1.23 1.11 > 1.28 1.37 > 1.27 1.23 > > > > R> b > EUR-USD NOK-SEK > 1.23 1.22 > 1.21 1.36 > 1.28 1.61 > 1.23 1.34 > 1.21 1.22 > > > > R> d > > EUR-USD NOK-SEK > 1.27 1.39 > 1.23 1.48 > 1.22 1.26 > 1.24 1.29 > 1.27 1.57 > 1.21 1.55 > 1.23 1.35 > 1.25 1.41 > 1.25 1.33 > 1.23 1.11 > 1.28 1.37 > 1.27 1.23 > > The twist is that these entries correspond to dates where the > *last* rows in each frame are today's entries, and so on > backwards in time. > > I would like to create a matrix of medians (a median for each row > and for each currency pair), but only for those rows where all > dataframes have entries. > > My answer in this case should look like: > > EUR-USD NOK-SEK > > 1.25 1.41 > 1.25 1.33 > 1.23 1.11 > 1.28 1.37 > 1.27 1.23 > > where the last EUR-USD entry = median(1.27, 1.21, 1.27), etc. > > Notice that the output is of the same dimensions as the smallest dataframe > (in this case 'b'). > > I can do it in a clumsy fashion by first obtaining the number > of rows in the smallest matrix, chopping off the top rows > of the other matrices to reduce them this size, then doing a > for-loop across each currency pair, row-wise, to create a > 3-vector which I then apply median() on. > > Surely there's a better way to do this? > > Please advise. > > Thanks, > > Murali Menon > > _________________________________________________________________ > Valentine?s Day -- Shop for gifts that spell L-O-V-E at MSN Shopping > > > ------------------------------------------------------------------------ > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Gabor Grothendieck
2007-Feb-13 21:23 UTC
[R] Computing stats on common parts of multiple dataframes
Suppose our data frames are called DF1, DF2 and DF3. Then find the least number of rows, n, among them. Create a list, DFs, of the last n rows of the data frames and another list, mats, which is the same but in which each component is a matrix. Create a parallel median function, pmedian, analogous to pmax and mapply it to the matrices. Finally replace that back into a data frame. n <- min(sapply(L, nrow)) DFs <- lapply(list(DF1, DF2, DF3), tail, n) mats <- lapply(DFs, as.matrix) pmedian <- function(...) median(c(...)) medians <- do.call(mapply, c(pmedian, mats)) replace(DFs[[1]], TRUE, medians) On 2/13/07, Murali Menon <feanor0 at hotmail.com> wrote:> Folks, > > I have three dataframes storing some information about > two currency pairs, as follows: > > R> a > > EUR-USD NOK-SEK > 1.23 1.33 > 1.22 1.43 > 1.26 1.42 > 1.24 1.50 > 1.21 1.36 > 1.26 1.60 > 1.29 1.44 > 1.25 1.36 > 1.27 1.39 > 1.23 1.48 > 1.22 1.26 > 1.24 1.29 > 1.27 1.57 > 1.21 1.55 > 1.23 1.35 > 1.25 1.41 > 1.25 1.30 > 1.23 1.11 > 1.28 1.37 > 1.27 1.23 > > > > R> b > EUR-USD NOK-SEK > 1.23 1.22 > 1.21 1.36 > 1.28 1.61 > 1.23 1.34 > 1.21 1.22 > > > > R> d > > EUR-USD NOK-SEK > 1.27 1.39 > 1.23 1.48 > 1.22 1.26 > 1.24 1.29 > 1.27 1.57 > 1.21 1.55 > 1.23 1.35 > 1.25 1.41 > 1.25 1.33 > 1.23 1.11 > 1.28 1.37 > 1.27 1.23 > > The twist is that these entries correspond to dates where the > *last* rows in each frame are today's entries, and so on > backwards in time. > > I would like to create a matrix of medians (a median for each row > and for each currency pair), but only for those rows where all > dataframes have entries. > > My answer in this case should look like: > > EUR-USD NOK-SEK > > 1.25 1.41 > 1.25 1.33 > 1.23 1.11 > 1.28 1.37 > 1.27 1.23 > > where the last EUR-USD entry = median(1.27, 1.21, 1.27), etc. > > Notice that the output is of the same dimensions as the smallest dataframe > (in this case 'b'). > > I can do it in a clumsy fashion by first obtaining the number > of rows in the smallest matrix, chopping off the top rows > of the other matrices to reduce them this size, then doing a > for-loop across each currency pair, row-wise, to create a > 3-vector which I then apply median() on. > > Surely there's a better way to do this? > > Please advise. > > Thanks, > > Murali Menon > > _________________________________________________________________ > Valentine's Day -- Shop for gifts that spell L-O-V-E at MSN Shopping > > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >