thr3ads.net - R help - [R] Securities earning covariance [Jun 2008]

If this information is useful, please help other people find it:
Share via:

ANGELO.LINARDI at bancaditalia.it

2008-Jun-05 15:41 UTC

[R] Securities earning covariance

Good morning,

I am a new R user and I am trying to learn how to use it.
I am trying to solve this problem.
I have a dataframe df of daily securities (for a year) earnings as
follows:

SEC_ID		DAY		EARNING
IT0000001	20070101	5.467
IT0000001	20070102	5.456
IT0000001	20070103	4.954
IT0000001	20070104	3.456
	            ..........................
IT0000002	20070101	1.456
IT0000002	20070102	1.345
IT0000002	20070103	1.233
		..........................
IT0000003	20070101	0.345
IT0000003	20070102	0.367
IT0000003	20070103	0.319	
		..........................

And so on: about 800 different SEC_ID and about 180000 rows.
I have to calculate the "covariance" for each couple of securities x
and
y according to the formula:

Cov(x,y) = (sum[(x-x')*(y-y')]/N)/(sx*sy)  

being x' and y' the mean of securities earning in the year, N the number
of observations, sx and sy the standard deviation of x and y.
To do this I could build a df2 data frame like this:

DAY		SEC_ID.x	SEC_ID.y	EARNING.x
EARNING.y	x'	y'	sx	sy
20070101	IT0000001	IT0000002	5.467		1.456
a	b	aa	bb
20070101	IT0000001	IT0000003	5.467		0.345
a	c	aa	cc
20070101	IT0000002	IT0000003	1.456		0.345
b	c	bb	cc
20070102	IT0000001	IT0000002	5.456		1.345
a	b	aa	bb
20070102	IT0000001	IT0000003	5.456		0.367
a	c	aa	cc
20070102	IT0000002	IT0000003	1.345		0.367
b	c	bb	cc
........................................................................
.......................................................

(merging df with itself with a condition SEC_ID.x < SEC_ID.y) and then
easily calculate the formula; but the dimensions are too big (the
process stops whit an out-of-memory message). 
Besides partitioning the input and using a loop, are there any smarter
solutions (eventually using split and other ways of "subgroup merging"
to solve the problem ?
Are there any "shortcuts" using statistical built-in functions (e.g.
cov, vcov) ?
Thank you in advance

Angelo Linardi



** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona fede e
non
comportano alcun vincolo ne' creano obblighi per la Banca stessa, salvo che
cio' non
sia espressamente previsto da un accordo scritto.
Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per errore, La
preghiamo di
comunicarne via e-mail la ricezione al mittente e di distruggerne il contenuto.
La
informiamo inoltre che l'utilizzo non autorizzato del messaggio o dei suoi
allegati
potrebbe costituire reato. Grazie per la collaborazione.
-- E-mails from the Bank of Italy are sent in good faith but they are neither
binding on
the Bank nor to be understood as creating any obligation on its part except
where
provided for in a written agreement. This e-mail is confidential. If you have
received it
by mistake, please inform the sender by reply e-mail and delete it from your
system.
Please also note that the unauthorized disclosure or use of the message or any 
attachments could be an offence. Thank you for your cooperation. **

Gabor Grothendieck

2008-Jun-05 15:54 UTC

head link

[R] Securities earning covariance

Check out the three vignettes (i.e. pdf documents in the zoo package). e.g.


Lines <- "SEC_ID          DAY             EARNING
IT0000001       20070101        5.467
IT0000001       20070102        5.456
IT0000001       20070103        4.954
IT0000001       20070104        3.456
IT0000002       20070101        1.456
IT0000002       20070102        1.345
IT0000002       20070103        1.233
IT0000003       20070101        0.345
IT0000003       20070102        0.367
IT0000003       20070103        0.319
"
DF <- read.table(textConnection(Lines), header = TRUE)
DFs <- split(DF, DF$SEC_ID)

library(zoo)
f <- function(DF.) zoo(DF.$EARNING, as.Date(format(DF.$DAY),
"%Y%m%d"))
z <- do.call(merge, lapply(DFs, f))
cov(z) # uses n-1


On Thu, Jun 5, 2008 at 11:41 AM,  <ANGELO.LINARDI at bancaditalia.it>
wrote:> Good morning,
>
> I am a new R user and I am trying to learn how to use it.
> I am trying to solve this problem.
> I have a dataframe df of daily securities (for a year) earnings as
> follows:
>
> SEC_ID          DAY             EARNING
> IT0000001       20070101        5.467
> IT0000001       20070102        5.456
> IT0000001       20070103        4.954
> IT0000001       20070104        3.456
>                    ..........................
> IT0000002       20070101        1.456
> IT0000002       20070102        1.345
> IT0000002       20070103        1.233
>                ..........................
> IT0000003       20070101        0.345
> IT0000003       20070102        0.367
> IT0000003       20070103        0.319
>                ..........................
>
> And so on: about 800 different SEC_ID and about 180000 rows.
> I have to calculate the "covariance" for each couple of
securities x and
> y according to the formula:
>
> Cov(x,y) = (sum[(x-x')*(y-y')]/N)/(sx*sy)
>
> being x' and y' the mean of securities earning in the year, N the
number
> of observations, sx and sy the standard deviation of x and y.
> To do this I could build a df2 data frame like this:
>
> DAY             SEC_ID.x        SEC_ID.y        EARNING.x
> EARNING.y       x'      y'      sx      sy
> 20070101        IT0000001       IT0000002       5.467           1.456
> a       b       aa      bb
> 20070101        IT0000001       IT0000003       5.467           0.345
> a       c       aa      cc
> 20070101        IT0000002       IT0000003       1.456           0.345
> b       c       bb      cc
> 20070102        IT0000001       IT0000002       5.456           1.345
> a       b       aa      bb
> 20070102        IT0000001       IT0000003       5.456           0.367
> a       c       aa      cc
> 20070102        IT0000002       IT0000003       1.345           0.367
> b       c       bb      cc
> ........................................................................
> .......................................................
>
> (merging df with itself with a condition SEC_ID.x < SEC_ID.y) and then
> easily calculate the formula; but the dimensions are too big (the
> process stops whit an out-of-memory message).
> Besides partitioning the input and using a loop, are there any smarter
> solutions (eventually using split and other ways of "subgroup
merging"
> to solve the problem ?
> Are there any "shortcuts" using statistical built-in functions
(e.g.
> cov, vcov) ?
> Thank you in advance
>
> Angelo Linardi
>
>
>
> ** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona
fede e non
> comportano alcun vincolo ne' creano obblighi per la Banca stessa, salvo
che cio' non
> sia espressamente previsto da un accordo scritto.
> Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per
errore, La preghiamo di
> comunicarne via e-mail la ricezione al mittente e di distruggerne il
contenuto. La
> informiamo inoltre che l'utilizzo non autorizzato del messaggio o dei
suoi allegati
> potrebbe costituire reato. Grazie per la collaborazione.
> -- E-mails from the Bank of Italy are sent in good faith but they are
neither binding on
> the Bank nor to be understood as creating any obligation on its part except
where
> provided for in a written agreement. This e-mail is confidential. If you
have received it
> by mistake, please inform the sender by reply e-mail and delete it from
your system.
> Please also note that the unauthorized disclosure or use of the message or
any
> attachments could be an offence. Thank you for your cooperation. **
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Patrick Burns

2008-Jun-05 16:11 UTC

head link

[R] Securities earning covariance

I would start by creating a matrix that held the
returns with rows being the dates and columns
being the securities.  You can do this by something
along the lines of:

days <- as.character(df[, 'DAY'])
sec <- as.character(df[, 'SEC_ID']
earningmat <- array(NA, c(length(unique(days)),
   length(unique(sec))), list(sort(unique(days)),
   unique(sec)))
submat <- cbind(match(days, rownames(earningmat)),
    match(sec, colnames(earningmat)))
earningmat[submat] <- as.numeric(as.character(df[, 'EARNING']))

Notice that while the 'as.numeric-as.character' in the last
line may not be needed -- if it is needed, it is needed in a big
way.  If the 'EARNING' column is a factor (because there
was at least one item that didn't appear to be numeric when
it was read in), then skipping the 'as.numeric-as.character'
call will put the codes for the factor into the matrix.  It will be
numeric as you expect, but complete garbage.

The trick with 'submat' is explained in any complete description
of subscripting -- the subscripting section of Chapter 1 of
S Poetry, for instance.

Once you have a suitable matrix, then you can use 'var' or
some other function to get the variance matrix.  Depending on
where you are going, a factor model variance may be better.
You can get 'factor.model.stat' from the public domain area
of the Burns Statistics website.  This is especially useful if there
are missing values in your matrix.


Patrick Burns
patrick at burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")



ANGELO.LINARDI at bancaditalia.it wrote:> Good morning,
>
> I am a new R user and I am trying to learn how to use it.
> I am trying to solve this problem.
> I have a dataframe df of daily securities (for a year) earnings as
> follows:
>
> SEC_ID		DAY		EARNING
> IT0000001	20070101	5.467
> IT0000001	20070102	5.456
> IT0000001	20070103	4.954
> IT0000001	20070104	3.456
> 	            ..........................
> IT0000002	20070101	1.456
> IT0000002	20070102	1.345
> IT0000002	20070103	1.233
> 		..........................
> IT0000003	20070101	0.345
> IT0000003	20070102	0.367
> IT0000003	20070103	0.319	
> 		..........................
>
> And so on: about 800 different SEC_ID and about 180000 rows.
> I have to calculate the "covariance" for each couple of
securities x and
> y according to the formula:
>
> Cov(x,y) = (sum[(x-x')*(y-y')]/N)/(sx*sy)  
>
> being x' and y' the mean of securities earning in the year, N the
number
> of observations, sx and sy the standard deviation of x and y.
> To do this I could build a df2 data frame like this:
>
> DAY		SEC_ID.x	SEC_ID.y	EARNING.x
> EARNING.y	x'	y'	sx	sy
> 20070101	IT0000001	IT0000002	5.467		1.456
> a	b	aa	bb
> 20070101	IT0000001	IT0000003	5.467		0.345
> a	c	aa	cc
> 20070101	IT0000002	IT0000003	1.456		0.345
> b	c	bb	cc
> 20070102	IT0000001	IT0000002	5.456		1.345
> a	b	aa	bb
> 20070102	IT0000001	IT0000003	5.456		0.367
> a	c	aa	cc
> 20070102	IT0000002	IT0000003	1.345		0.367
> b	c	bb	cc
> ........................................................................
> .......................................................
>
> (merging df with itself with a condition SEC_ID.x < SEC_ID.y) and then
> easily calculate the formula; but the dimensions are too big (the
> process stops whit an out-of-memory message). 
> Besides partitioning the input and using a loop, are there any smarter
> solutions (eventually using split and other ways of "subgroup
merging"
> to solve the problem ?
> Are there any "shortcuts" using statistical built-in functions
(e.g.
> cov, vcov) ?
> Thank you in advance
>
> Angelo Linardi
>
>
>
> ** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona
fede e non
> comportano alcun vincolo ne' creano obblighi per la Banca stessa, salvo
che cio' non
> sia espressamente previsto da un accordo scritto.
> Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per
errore, La preghiamo di
> comunicarne via e-mail la ricezione al mittente e di distruggerne il
contenuto. La
> informiamo inoltre che l'utilizzo non autorizzato del messaggio o dei
suoi allegati
> potrebbe costituire reato. Grazie per la collaborazione.
> -- E-mails from the Bank of Italy are sent in good faith but they are
neither binding on
> the Bank nor to be understood as creating any obligation on its part except
where
> provided for in a written agreement. This e-mail is confidential. If you
have received it
> by mistake, please inform the sender by reply e-mail and delete it from
your system.
> Please also note that the unauthorized disclosure or use of the message or
any
> attachments could be an offence. Thank you for your cooperation. **
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>

Maybe Matching Threads

Search for more apparently analagous threads

R help - Jun 2008 - Securities earning covariance

[R] Securities earning covariance

[R] Securities earning covariance

[R] Securities earning covariance

Maybe Matching Threads