thr3ads.net - R help - [R] Computing stats on common parts of multiple dataframes [Feb 2007]

If this information is useful, please help other people find it:
Share via:

Murali Menon

2007-Feb-13 17:27 UTC

[R] Computing stats on common parts of multiple dataframes

Folks,

I have three dataframes storing some information about
two currency pairs, as follows:

R> a

EUR-USD	NOK-SEK
1.23	1.33
1.22	1.43
1.26	1.42
1.24	1.50
1.21	1.36
1.26	1.60
1.29	1.44
1.25	1.36
1.27	1.39
1.23	1.48
1.22	1.26
1.24	1.29
1.27	1.57
1.21	1.55
1.23	1.35
1.25	1.41
1.25	1.30
1.23	1.11
1.28	1.37
1.27	1.23



R> b
EUR-USD	NOK-SEK
1.23	1.22
1.21	1.36
1.28	1.61
1.23	1.34
1.21	1.22



R> d

EUR-USD	NOK-SEK
1.27	1.39
1.23	1.48
1.22	1.26
1.24	1.29
1.27	1.57
1.21	1.55
1.23	1.35
1.25	1.41
1.25	1.33
1.23	1.11
1.28	1.37
1.27	1.23

The twist is that these entries correspond to dates where the
*last* rows in each frame are today's entries, and so on
backwards in time.

I would like to create a matrix of medians (a median for each row
and for each currency pair), but only for those rows where all
dataframes have entries.

My answer in this case should look like:

EUR-USD	NOK-SEK

1.25	1.41
1.25	1.33
1.23	1.11
1.28	1.37
1.27	1.23

where the last EUR-USD entry = median(1.27, 1.21, 1.27), etc.

Notice that the output is of the same dimensions as the smallest dataframe
(in this case 'b').

I can do it in a clumsy fashion by first obtaining the number
of rows in the smallest matrix, chopping off the top rows
of the other matrices to reduce them this size, then doing a
for-loop across each currency pair, row-wise, to create a
3-vector which I then apply median() on.

Surely there's a better way to do this?

Please advise.

Thanks,

Murali Menon

_________________________________________________________________
Valentine?s Day -- Shop for gifts that spell L-O-V-E at MSN Shopping

Erik Iverson

2007-Feb-13 20:42 UTC

head link

[R] Computing stats on common parts of multiple dataframes

Murali -

I've come up with something that might with work, with gratutious use of 
the *apply functions.  See ?apply, ?lappy, and ?mapply for how this 
would work.  Basically, just set my.list equal to a list of data.frames 
  you would like included.  I made this to work with matrices first, so 
it does use as.matrix() in my function.  Also, this could be turned into 
a general function so that you could specify a different function other 
than "median".

#Make my.list equal to a list of dataframes you want
my.list <- list(df1,df2)

#What's the shortest?
minrow <- min(sapply(my.list,nrow))
#Chop all to the shortest
tmp <- lapply(my.list, function(x) x[(nrow(x)-(minrow-1)):nrow(x),])

#Do the computation, could change median to mean, or a user defined
#function
matrix(apply(mapply("[",lapply(tmp,as.matrix), 
MoreArgs=list(1:(minrow*2))), 1, median),
        ncol=2)

HTH, whether or not this is any "better" than your for loop solution
is
left up to you.

Erik


Murali Menon wrote:> Folks,
> 
> I have three dataframes storing some information about
> two currency pairs, as follows:
> 
> R> a
> 
> EUR-USD    NOK-SEK
> 1.23    1.33
> 1.22    1.43
> 1.26    1.42
> 1.24    1.50
> 1.21    1.36
> 1.26    1.60
> 1.29    1.44
> 1.25    1.36
> 1.27    1.39
> 1.23    1.48
> 1.22    1.26
> 1.24    1.29
> 1.27    1.57
> 1.21    1.55
> 1.23    1.35
> 1.25    1.41
> 1.25    1.30
> 1.23    1.11
> 1.28    1.37
> 1.27    1.23
> 
> 
> 
> R> b
> EUR-USD    NOK-SEK
> 1.23    1.22
> 1.21    1.36
> 1.28    1.61
> 1.23    1.34
> 1.21    1.22
> 
> 
> 
> R> d
> 
> EUR-USD    NOK-SEK
> 1.27    1.39
> 1.23    1.48
> 1.22    1.26
> 1.24    1.29
> 1.27    1.57
> 1.21    1.55
> 1.23    1.35
> 1.25    1.41
> 1.25    1.33
> 1.23    1.11
> 1.28    1.37
> 1.27    1.23
> 
> The twist is that these entries correspond to dates where the
> *last* rows in each frame are today's entries, and so on
> backwards in time.
> 
> I would like to create a matrix of medians (a median for each row
> and for each currency pair), but only for those rows where all
> dataframes have entries.
> 
> My answer in this case should look like:
> 
> EUR-USD    NOK-SEK
> 
> 1.25    1.41
> 1.25    1.33
> 1.23    1.11
> 1.28    1.37
> 1.27    1.23
> 
> where the last EUR-USD entry = median(1.27, 1.21, 1.27), etc.
> 
> Notice that the output is of the same dimensions as the smallest dataframe
> (in this case 'b').
> 
> I can do it in a clumsy fashion by first obtaining the number
> of rows in the smallest matrix, chopping off the top rows
> of the other matrices to reduce them this size, then doing a
> for-loop across each currency pair, row-wise, to create a
> 3-vector which I then apply median() on.
> 
> Surely there's a better way to do this?
> 
> Please advise.
> 
> Thanks,
> 
> Murali Menon
> 
> _________________________________________________________________
> Valentine?s Day -- Shop for gifts that spell L-O-V-E at MSN Shopping
> 
> 
> ------------------------------------------------------------------------
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Gabor Grothendieck

2007-Feb-13 21:23 UTC

head link

[R] Computing stats on common parts of multiple dataframes

Suppose our data frames are called DF1, DF2 and DF3.  Then
find the least number of rows, n, among them.  Create a
list, DFs, of the last n rows of the data frames and another
list, mats, which is the same but in which each component is a
matrix.  Create a parallel median function, pmedian, analogous
to pmax and mapply it to the matrices.  Finally replace that
back into a data frame.

n <- min(sapply(L, nrow))
DFs <- lapply(list(DF1, DF2, DF3), tail, n)
mats <- lapply(DFs, as.matrix)
pmedian <- function(...) median(c(...))
medians <- do.call(mapply, c(pmedian, mats))
replace(DFs[[1]], TRUE, medians)


On 2/13/07, Murali Menon <feanor0 at hotmail.com>
wrote:> Folks,
>
> I have three dataframes storing some information about
> two currency pairs, as follows:
>
> R> a
>
> EUR-USD NOK-SEK
> 1.23    1.33
> 1.22    1.43
> 1.26    1.42
> 1.24    1.50
> 1.21    1.36
> 1.26    1.60
> 1.29    1.44
> 1.25    1.36
> 1.27    1.39
> 1.23    1.48
> 1.22    1.26
> 1.24    1.29
> 1.27    1.57
> 1.21    1.55
> 1.23    1.35
> 1.25    1.41
> 1.25    1.30
> 1.23    1.11
> 1.28    1.37
> 1.27    1.23
>
>
>
> R> b
> EUR-USD NOK-SEK
> 1.23    1.22
> 1.21    1.36
> 1.28    1.61
> 1.23    1.34
> 1.21    1.22
>
>
>
> R> d
>
> EUR-USD NOK-SEK
> 1.27    1.39
> 1.23    1.48
> 1.22    1.26
> 1.24    1.29
> 1.27    1.57
> 1.21    1.55
> 1.23    1.35
> 1.25    1.41
> 1.25    1.33
> 1.23    1.11
> 1.28    1.37
> 1.27    1.23
>
> The twist is that these entries correspond to dates where the
> *last* rows in each frame are today's entries, and so on
> backwards in time.
>
> I would like to create a matrix of medians (a median for each row
> and for each currency pair), but only for those rows where all
> dataframes have entries.
>
> My answer in this case should look like:
>
> EUR-USD NOK-SEK
>
> 1.25    1.41
> 1.25    1.33
> 1.23    1.11
> 1.28    1.37
> 1.27    1.23
>
> where the last EUR-USD entry = median(1.27, 1.21, 1.27), etc.
>
> Notice that the output is of the same dimensions as the smallest dataframe
> (in this case 'b').
>
> I can do it in a clumsy fashion by first obtaining the number
> of rows in the smallest matrix, chopping off the top rows
> of the other matrices to reduce them this size, then doing a
> for-loop across each currency pair, row-wise, to create a
> 3-vector which I then apply median() on.
>
> Surely there's a better way to do this?
>
> Please advise.
>
> Thanks,
>
> Murali Menon
>
> _________________________________________________________________
> Valentine's Day -- Shop for gifts that spell L-O-V-E at MSN Shopping
>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

Possibly Parallel Threads

Search for more reasonably related threads

R help - Feb 2007 - Computing stats on common parts of multiple dataframes

[R] Computing stats on common parts of multiple dataframes

[R] Computing stats on common parts of multiple dataframes

[R] Computing stats on common parts of multiple dataframes

Possibly Parallel Threads