thr3ads.net - R help - [R] Conditional Sums for Index creation [May 2007]

If this information is useful, please help other people find it:
Share via:

Patnaik, Tirthankar

2007-May-14 06:23 UTC

[R] Conditional Sums for Index creation

Hi,
	Apologies for the long mail. I have a data.frame with columns of
price/mcap data for a portfolio of stocks, and the date. To get the
total value of the portfolio on a daily basis, I calculate rowSums of
the data.frame. 
> set.seed(1)
> ab <- matrix(round(runif(100)*100),nrow=20,ncol=5)
> ab[1:5,4:5] <- NA
> ab[6:10,5] <- NA
> ac <- as.data.frame(ifelse(ab <= 7,NA,ab))
> ac   V1 V2 V3 V4 V5
1  27 93 82 NA NA
2  37 21 65 NA NA
3  57 65 78 NA NA
4  91 13 55 NA NA
5  20 27 53 NA NA
6  90 39 79 26 NA
7  94 NA NA 48 NA
8  66 38 48 77 NA
9  63 87 73  8 NA
10 NA 34 69 88 NA
11 21 48 48 34 24
12 18 60 86 84 NA
13 69 49 44 35 64
14 38 19 24 33 88
15 77 83 NA 48 78
16 50 67 10 89 80
17 72 79 32 86 46
18 99 11 52 39 41
19 38 72 66 78 81
20 78 41 41 96 60> 
Here the rows 1:20 are dates (also in my data.frame). 

Since some of the prices have NA, the rowSums is made to ignore these
entries. 
> rowSums(ac,na.rm=TRUE)  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
19  20 
202 123 200 159 100 234 142 229 231 191 175 248 261 202 286 296 315 242
335 316 > 
Stocks are being added to the portfolio too. So from date=6 (or row=6)
we have the 4th stock V4, and from date=11, we have the 5th stock V5. My
problem is that I need to calculate the rowSums for row=6 (When a new
stock was added), _with_ and _without_ the new stock. So my answer for
row=6 would be 234 for the plain row-sum, and 234 - 26 = 208 for the
original set of stocks (without V4). Similarly, my answer for row=11
would be 175 for the plain sum, and 175 - 24 = 151 for the original sum
(without V5). 

Basically I'm interested in finding out the value of the portfolio with
and without the new stock for the purposes of creating an index. It's
possible that some stocks my get dropped later, in which case there
would be an NA series starting for say V1 at row=18 and so on. In that
case, the aim would be to find the sum at row=18 with and without the
value of V1. 

Is there any way I can get the sum over columns, deleting specific
colums? To get the columns that are NA in any row, I tried (shown for
the first 12 rows):
> apply(ac[1:12,],1,function(y)which(is.na(y)))
Which correctly gives 

$`1`
V4 V5 
 4  5 

$`2`
V4 V5 
 4  5 

$`3`
V4 V5 
 4  5 

$`4`
V4 V5 
 4  5 

$`5`
V4 V5 
 4  5 

$`6`
V5 
 5 

$`7`
V2 V3 V5 
 2  3  5 

$`8`
V5 
 5 

$`9`
V5 
 5 

$`10`
V1 V5 
 1  5 

$`11`
integer(0)

$`12`
V5 
 5 
> 
But now I'm stuck. I don't how to use this list of indices at each row
to exclude my columns. 

Any pointers please? Would such an exercise be easier if I use a
time-series based object, like a zoo.


TIA and best,
-Tir

Tirthankar Patnaik
India Strategy
Citigroup Investment Research
+91-22-6631 9887

gyadav at ccilindia.co.in

2007-May-14 08:23 UTC

head link

[R] Conditional Sums for Index creation

Hello

My dear friend.... the problem is not very heavy its very light, but you 
have to assume somethings.
as you say that any new stock can come anytime and go anytime, further, 
any time NA can come .....

both statements are so close that it would be nearly impossible to know 
which one is NA (of before the start of series and which one is end of 
series ) and 
which one is NA of series of genuine data which is missing for one day or 
two.

There can be n number of workarounds depending on the propersties of 
dataset.
for e.g.


1.
Given that genuine data is not having not more than two NA consequtively
Work around : Check for past two days that NAs are there or not
Flaw: Will be a lagged kind of computation

2. 
Given that series will have continuously NAs 
Work around : you can go for checking for whole of the past data whether 
there was any number or all NAs
Flaw : Based on backward looking hence will never know that new stock has 
come or not.
e,g, stock may have come but the observation was not recorded hence NA

etc etc

clarify more of your stand, do you get information before hand that today 
stock will get added or deleted.
meaning any exogenous information which will help to optimize it




Regards,

Gaurav Yadav
+++++++++++
Assistant Manager, CCIL, Mumbai (India)
Mob: +919821286118 Email: mailtogauravyadav@gmail.com
Bhagavad Gita:  Man is made by his belief, as He believes, so He is





"Patnaik, Tirthankar " <tirthankar.patnaik@citi.com> 
Sent by: r-help-bounces@stat.math.ethz.ch
05/14/2007 11:53 AM

To
<r-help@stat.math.ethz.ch>
cc

Subject
[R] Conditional Sums for Index creation






Hi,
                 Apologies for the long mail. I have a data.frame with 
columns of
price/mcap data for a portfolio of stocks, and the date. To get the
total value of the portfolio on a daily basis, I calculate rowSums of
the data.frame. 
> set.seed(1)
> ab <- matrix(round(runif(100)*100),nrow=20,ncol=5)
> ab[1:5,4:5] <- NA
> ab[6:10,5] <- NA
> ac <- as.data.frame(ifelse(ab <= 7,NA,ab))
> ac   V1 V2 V3 V4 V5
1  27 93 82 NA NA
2  37 21 65 NA NA
3  57 65 78 NA NA
4  91 13 55 NA NA
5  20 27 53 NA NA
6  90 39 79 26 NA
7  94 NA NA 48 NA
8  66 38 48 77 NA
9  63 87 73  8 NA
10 NA 34 69 88 NA
11 21 48 48 34 24
12 18 60 86 84 NA
13 69 49 44 35 64
14 38 19 24 33 88
15 77 83 NA 48 78
16 50 67 10 89 80
17 72 79 32 86 46
18 99 11 52 39 41
19 38 72 66 78 81
20 78 41 41 96 60> 
Here the rows 1:20 are dates (also in my data.frame). 

Since some of the prices have NA, the rowSums is made to ignore these
entries. 
> rowSums(ac,na.rm=TRUE)  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
19  20 
202 123 200 159 100 234 142 229 231 191 175 248 261 202 286 296 315 242
335 316 > 
Stocks are being added to the portfolio too. So from date=6 (or row=6)
we have the 4th stock V4, and from date=11, we have the 5th stock V5. My
problem is that I need to calculate the rowSums for row=6 (When a new
stock was added), _with_ and _without_ the new stock. So my answer for
row=6 would be 234 for the plain row-sum, and 234 - 26 = 208 for the
original set of stocks (without V4). Similarly, my answer for row=11
would be 175 for the plain sum, and 175 - 24 = 151 for the original sum
(without V5). 

Basically I'm interested in finding out the value of the portfolio with
and without the new stock for the purposes of creating an index. It's
possible that some stocks my get dropped later, in which case there
would be an NA series starting for say V1 at row=18 and so on. In that
case, the aim would be to find the sum at row=18 with and without the
value of V1. 

Is there any way I can get the sum over columns, deleting specific
colums? To get the columns that are NA in any row, I tried (shown for
the first 12 rows):
> apply(ac[1:12,],1,function(y)which(is.na(y)))
Which correctly gives 

$`1`
V4 V5 
 4  5 

$`2`
V4 V5 
 4  5 

$`3`
V4 V5 
 4  5 

$`4`
V4 V5 
 4  5 

$`5`
V4 V5 
 4  5 

$`6`
V5 
 5 

$`7`
V2 V3 V5 
 2  3  5 

$`8`
V5 
 5 

$`9`
V5 
 5 

$`10`
V1 V5 
 1  5 

$`11`
integer(0)

$`12`
V5 
 5 
> 
But now I'm stuck. I don't how to use this list of indices at each row
to exclude my columns. 

Any pointers please? Would such an exercise be easier if I use a
time-series based object, like a zoo.


TIA and best,
-Tir

Tirthankar Patnaik
India Strategy
Citigroup Investment Research
+91-22-6631 9887

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



===========================================================================================DISCLAIMER
AND CONFIDENTIALITY CAUTION:\ \ This message and ...{{dropped}}

Petr PIKAL

2007-May-14 08:50 UTC

head link

[R] Odp: Conditional Sums for Index creation

Hi

there probably is not an easy out of box solution. Some use can be in rle 
function e.g.
> apply(apply(ac,2,function(x) (is.na(x))),2, function(x) rle(x)$values[1])
   V1    V2    V3    V4    V5 
FALSE FALSE FALSE  TRUE  TRUE 

gives you columns which start with NA and
> apply(apply(ac,2,function(x) (is.na(x))),2, function(x) rle(x)$lengths[1])
V1 V2 V3 V4 V5 
 9  6  6  5 10 >
gives you length of NA consecutive sequence in each column. Then you can 
elaborate a condition which will compute rowSums differently for a 
particular row. E.g. when there is a TRUE value take a number for that 
given column and increase it by one. You will get row number for which you 
need rowSum with and without this particular column.

Regards

Petr Pikal
petr.pikal at precheza.cz

r-help-bounces at stat.math.ethz.ch napsal dne 14.05.2007 08:23:01:
> Hi,
>    Apologies for the long mail. I have a data.frame with columns of
> price/mcap data for a portfolio of stocks, and the date. To get the
> total value of the portfolio on a daily basis, I calculate rowSums of
> the data.frame. 
> 
> > set.seed(1)
> > ab <- matrix(round(runif(100)*100),nrow=20,ncol=5)
> > ab[1:5,4:5] <- NA
> > ab[6:10,5] <- NA
> > ac <- as.data.frame(ifelse(ab <= 7,NA,ab))
> > ac
>    V1 V2 V3 V4 V5
> 1  27 93 82 NA NA
> 2  37 21 65 NA NA
> 3  57 65 78 NA NA
> 4  91 13 55 NA NA
> 5  20 27 53 NA NA
> 6  90 39 79 26 NA
> 7  94 NA NA 48 NA
> 8  66 38 48 77 NA
> 9  63 87 73  8 NA
> 10 NA 34 69 88 NA
> 11 21 48 48 34 24
> 12 18 60 86 84 NA
> 13 69 49 44 35 64
> 14 38 19 24 33 88
> 15 77 83 NA 48 78
> 16 50 67 10 89 80
> 17 72 79 32 86 46
> 18 99 11 52 39 41
> 19 38 72 66 78 81
> 20 78 41 41 96 60
> > 
> 
> Here the rows 1:20 are dates (also in my data.frame). 
> 
> Since some of the prices have NA, the rowSums is made to ignore these
> entries. 
> 
> > rowSums(ac,na.rm=TRUE)
>   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
> 19  20 
> 202 123 200 159 100 234 142 229 231 191 175 248 261 202 286 296 315 242
> 335 316 
> > 
> 
> Stocks are being added to the portfolio too. So from date=6 (or row=6)
> we have the 4th stock V4, and from date=11, we have the 5th stock V5. My
> problem is that I need to calculate the rowSums for row=6 (When a new
> stock was added), _with_ and _without_ the new stock. So my answer for
> row=6 would be 234 for the plain row-sum, and 234 - 26 = 208 for the
> original set of stocks (without V4). Similarly, my answer for row=11
> would be 175 for the plain sum, and 175 - 24 = 151 for the original sum
> (without V5). 
> 
> Basically I'm interested in finding out the value of the portfolio with
> and without the new stock for the purposes of creating an index. It's
> possible that some stocks my get dropped later, in which case there
> would be an NA series starting for say V1 at row=18 and so on. In that
> case, the aim would be to find the sum at row=18 with and without the
> value of V1. 
> 
> Is there any way I can get the sum over columns, deleting specific
> colums? To get the columns that are NA in any row, I tried (shown for
> the first 12 rows):
> 
> > apply(ac[1:12,],1,function(y)which(is.na(y)))
> 
> Which correctly gives 
> 
> $`1`
> V4 V5 
>  4  5 
> 
> $`2`
> V4 V5 
>  4  5 
> 
> $`3`
> V4 V5 
>  4  5 
> 
> $`4`
> V4 V5 
>  4  5 
> 
> $`5`
> V4 V5 
>  4  5 
> 
> $`6`
> V5 
>  5 
> 
> $`7`
> V2 V3 V5 
>  2  3  5 
> 
> $`8`
> V5 
>  5 
> 
> $`9`
> V5 
>  5 
> 
> $`10`
> V1 V5 
>  1  5 
> 
> $`11`
> integer(0)
> 
> $`12`
> V5 
>  5 
> 
> > 
> 
> But now I'm stuck. I don't how to use this list of indices at each
row
> to exclude my columns. 
> 
> Any pointers please? Would such an exercise be easier if I use a
> time-series based object, like a zoo.
> 
> 
> TIA and best,
> -Tir
> 
> Tirthankar Patnaik
> India Strategy
> Citigroup Investment Research
> +91-22-6631 9887
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.

Patnaik, Tirthankar

2007-May-14 09:10 UTC

head link

[R] Conditional Sums for Index creation

All,
	I guess the idea is identify the number of NAs na(r) in
particular row r, then calculate the 'original' row-sum ors(r) based on
the info on the set of non-missings in the previous row, i.e., na(r-1),
right? 

Gaurav, I'd assume that the data is always for a stock, and the decision
to include a stock in the portfolio would be mine.

# Indexes of all the non-missing in all the rows
ad <- apply(ac,1,function(y)which(!is.na(y)))

Now here, if I try a row-by-row evaluation: 
> sum(ac[6,unlist(ad[5])],na.rm=TRUE)
[1] 208> 
> sum(ac[11,unlist(ad[10])],na.rm=TRUE)
[1] 130> 
> 
I get the answer I need! Works exactly well (Please see my mail below
for the expected answer. 151 in row=11 was incorrect, the right answer
is indeed 130. 

However, if I try and use this for all the rows in the data.frame:

ae <- ac

for (r in 2:nrow(ac)){
	print(r)
	ae$sumCorr <- unlist(sum(ac[r,unlist(ad[r-1])],na.rm=TRUE))
}
> head(ae)  V1 V2 V3 V4 V5 sumCorr
1 27 93 82 NA NA     316
2 37 21 65 NA NA     316
3 57 65 78 NA NA     316
4 91 13 55 NA NA     316
5 20 27 53 NA NA     316
6 90 39 79 26 NA     316> 
Which is obviously not right. The question is why does ae$sumCorr not
update with the change in row.  IMHO if I have this, then the solution
is nigh! :)

TIA and best,
-Tir

-----Original Message-----
From: Petr PIKAL [mailto:petr.pikal at precheza.cz] 
Sent: Monday, May 14, 2007 2:21 PM
To: Patnaik, Tirthankar [GWM-CIR]
Cc: r-help at stat.math.ethz.ch
Subject: Odp: [R] Conditional Sums for Index creation

Hi

there probably is not an easy out of box solution. Some use can be in
rle function e.g.
> apply(apply(ac,2,function(x) (is.na(x))),2, function(x)rle(x)$values[1])
   V1    V2    V3    V4    V5 
FALSE FALSE FALSE  TRUE  TRUE 

gives you columns which start with NA and
> apply(apply(ac,2,function(x) (is.na(x))),2, function(x)rle(x)$lengths[1])
V1 V2 V3 V4 V5
 9  6  6  5 10 >
gives you length of NA consecutive sequence in each column. Then you can

elaborate a condition which will compute rowSums differently for a 
particular row. E.g. when there is a TRUE value take a number for that 
given column and increase it by one. You will get row number for which
you 
need rowSum with and without this particular column.

Regards

Petr Pikal
petr.pikal at precheza.cz

r-help-bounces at stat.math.ethz.ch napsal dne 14.05.2007 08:23:01:
> Hi,
>    Apologies for the long mail. I have a data.frame with columns of
> price/mcap data for a portfolio of stocks, and the date. To get the
> total value of the portfolio on a daily basis, I calculate rowSums of
> the data.frame. 
> 
> > set.seed(1)
> > ab <- matrix(round(runif(100)*100),nrow=20,ncol=5)
> > ab[1:5,4:5] <- NA
> > ab[6:10,5] <- NA
> > ac <- as.data.frame(ifelse(ab <= 7,NA,ab))
> > ac
>    V1 V2 V3 V4 V5
> 1  27 93 82 NA NA
> 2  37 21 65 NA NA
> 3  57 65 78 NA NA
> 4  91 13 55 NA NA
> 5  20 27 53 NA NA
> 6  90 39 79 26 NA
> 7  94 NA NA 48 NA
> 8  66 38 48 77 NA
> 9  63 87 73  8 NA
> 10 NA 34 69 88 NA
> 11 21 48 48 34 24
> 12 18 60 86 84 NA
> 13 69 49 44 35 64
> 14 38 19 24 33 88
> 15 77 83 NA 48 78
> 16 50 67 10 89 80
> 17 72 79 32 86 46
> 18 99 11 52 39 41
> 19 38 72 66 78 81
> 20 78 41 41 96 60
> > 
> 
> Here the rows 1:20 are dates (also in my data.frame). 
> 
> Since some of the prices have NA, the rowSums is made to ignore these
> entries. 
> 
> > rowSums(ac,na.rm=TRUE)
>   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
18> 19  20 
> 202 123 200 159 100 234 142 229 231 191 175 248 261 202 286 296 315
242> 335 316 
> > 
> 
> Stocks are being added to the portfolio too. So from date=6 (or row=6)
> we have the 4th stock V4, and from date=11, we have the 5th stock V5.
My> problem is that I need to calculate the rowSums for row=6 (When a new
> stock was added), _with_ and _without_ the new stock. So my answer for
> row=6 would be 234 for the plain row-sum, and 234 - 26 = 208 for the
> original set of stocks (without V4). Similarly, my answer for row=11
> would be 175 for the plain sum, and 175 - 24 = 151 for the original
sum> (without V5). 
> 
> Basically I'm interested in finding out the value of the portfolio
with> and without the new stock for the purposes of creating an index. It's
> possible that some stocks my get dropped later, in which case there
> would be an NA series starting for say V1 at row=18 and so on. In that
> case, the aim would be to find the sum at row=18 with and without the
> value of V1. 
> 
> Is there any way I can get the sum over columns, deleting specific
> colums? To get the columns that are NA in any row, I tried (shown for
> the first 12 rows):
> 
> > apply(ac[1:12,],1,function(y)which(is.na(y)))
> 
> Which correctly gives 
> 
> $`1`
> V4 V5 
>  4  5 
> 
> $`2`
> V4 V5 
>  4  5 
> 
> $`3`
> V4 V5 
>  4  5 
> 
> $`4`
> V4 V5 
>  4  5 
> 
> $`5`
> V4 V5 
>  4  5 
> 
> $`6`
> V5 
>  5 
> 
> $`7`
> V2 V3 V5 
>  2  3  5 
> 
> $`8`
> V5 
>  5 
> 
> $`9`
> V5 
>  5 
> 
> $`10`
> V1 V5 
>  1  5 
> 
> $`11`
> integer(0)
> 
> $`12`
> V5 
>  5 
> 
> > 
> 
> But now I'm stuck. I don't how to use this list of indices at each
row
> to exclude my columns. 
> 
> Any pointers please? Would such an exercise be easier if I use a
> time-series based object, like a zoo.
> 
> 
> TIA and best,
> -Tir
> 
> Tirthankar Patnaik
> India Strategy
> Citigroup Investment Research
> +91-22-6631 9887
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.

gyadav at ccilindia.co.in

2007-May-14 12:59 UTC

head link

[R] Conditional Sums for Index creation

Hi Tirthankar

this will help you

ind is a matrix which indicates the start of any new stock.
ind[i,j] means that in j + 1 column all the values from 1st row to i - 1 
row are all NAs.


> x      V2 V3 V4 V5 V6
 [1,] 27 93 82 NA NA
 [2,] 37 21 65 NA NA
 [3,] 57 65 78 NA NA
 [4,] 91 13 55 NA NA
 [5,] 20 27 53 NA NA
 [6,] 90 39 79 26 NA
 [7,] 94 NA NA 48 NA
 [8,] 66 38 48 77 NA
 [9,] 63 87 73  8 NA
[10,] NA 34 69 88 NA
[11,] 21 48 48 34 24
[12,] 18 60 86 84 NA
[13,] 69 49 44 35 64
[14,] 38 19 24 33 88
[15,] 77 83 NA 48 78
[16,] 50 67 10 89 80
[17,] 72 79 32 86 46
[18,] 99 11 52 39 41
[19,] 38 72 66 78 81
[20,] 78 41 41 96 60> 
> for ( j in 1:length(x[1,]) - 1) {+ for ( i in 2:length(x[,1])) {
+ indicator<-TRUE
+ for (k in 1: i - 1){
+ indicator <- indicator && is.na(x[k,j+1])
+ }
+ ind[i,j]<-indicator
+ 
+ }
+ }> ind      V2 V3 V4 V5 V6
 [1,] NA NA NA NA NA
 [2,]  0  0 NA NA  0
 [3,]  0  0 NA NA  0
 [4,]  0  0 NA NA  0
 [5,]  0  0 NA NA  0
 [6,]  0  0 NA NA  0
 [7,]  0  0  0 NA  0
 [8,]  0  0  0 NA  0
 [9,]  0  0  0 NA  0
[10,]  0  0  0 NA  0
[11,]  0  0  0 NA  0
[12,]  0  0  0  0  0
[13,]  0  0  0  0  0
[14,]  0  0  0  0  0
[15,]  0  0  0  0  0
[16,]  0  0  0  0  0
[17,]  0  0  0  0  0
[18,]  0  0  0  0  0
[19,]  0  0  0  0  0
[20,]  0  0  0  0  0> 

Regards,

Gaurav Yadav
+++++++++++
Assistant Manager, CCIL, Mumbai (India)
Mob: +919821286118 Email: mailtogauravyadav@gmail.com
Bhagavad Gita:  Man is made by his Belief, as He believes, so He is



"Patnaik, Tirthankar " <tirthankar.patnaik@citi.com> 
Sent by: r-help-bounces@stat.math.ethz.ch
05/14/2007 11:53 AM

To
<r-help@stat.math.ethz.ch>
cc

Subject
[R] Conditional Sums for Index creation






Hi,
                 Apologies for the long mail. I have a data.frame with 
columns of
price/mcap data for a portfolio of stocks, and the date. To get the
total value of the portfolio on a daily basis, I calculate rowSums of
the data.frame. 
> set.seed(1)
> ab <- matrix(round(runif(100)*100),nrow=20,ncol=5)
> ab[1:5,4:5] <- NA
> ab[6:10,5] <- NA
> ac <- as.data.frame(ifelse(ab <= 7,NA,ab))
> ac   V1 V2 V3 V4 V5
1  27 93 82 NA NA
2  37 21 65 NA NA
3  57 65 78 NA NA
4  91 13 55 NA NA
5  20 27 53 NA NA
6  90 39 79 26 NA
7  94 NA NA 48 NA
8  66 38 48 77 NA
9  63 87 73  8 NA
10 NA 34 69 88 NA
11 21 48 48 34 24
12 18 60 86 84 NA
13 69 49 44 35 64
14 38 19 24 33 88
15 77 83 NA 48 78
16 50 67 10 89 80
17 72 79 32 86 46
18 99 11 52 39 41
19 38 72 66 78 81
20 78 41 41 96 60> 
Here the rows 1:20 are dates (also in my data.frame). 

Since some of the prices have NA, the rowSums is made to ignore these
entries. 
> rowSums(ac,na.rm=TRUE)  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
19  20 
202 123 200 159 100 234 142 229 231 191 175 248 261 202 286 296 315 242
335 316 > 
Stocks are being added to the portfolio too. So from date=6 (or row=6)
we have the 4th stock V4, and from date=11, we have the 5th stock V5. My
problem is that I need to calculate the rowSums for row=6 (When a new
stock was added), _with_ and _without_ the new stock. So my answer for
row=6 would be 234 for the plain row-sum, and 234 - 26 = 208 for the
original set of stocks (without V4). Similarly, my answer for row=11
would be 175 for the plain sum, and 175 - 24 = 151 for the original sum
(without V5). 

Basically I'm interested in finding out the value of the portfolio with
and without the new stock for the purposes of creating an index. It's
possible that some stocks my get dropped later, in which case there
would be an NA series starting for say V1 at row=18 and so on. In that
case, the aim would be to find the sum at row=18 with and without the
value of V1. 

Is there any way I can get the sum over columns, deleting specific
colums? To get the columns that are NA in any row, I tried (shown for
the first 12 rows):
> apply(ac[1:12,],1,function(y)which(is.na(y)))
Which correctly gives 

$`1`
V4 V5 
 4  5 

$`2`
V4 V5 
 4  5 

$`3`
V4 V5 
 4  5 

$`4`
V4 V5 
 4  5 

$`5`
V4 V5 
 4  5 

$`6`
V5 
 5 

$`7`
V2 V3 V5 
 2  3  5 

$`8`
V5 
 5 

$`9`
V5 
 5 

$`10`
V1 V5 
 1  5 

$`11`
integer(0)

$`12`
V5 
 5 
> 
But now I'm stuck. I don't how to use this list of indices at each row
to exclude my columns. 

Any pointers please? Would such an exercise be easier if I use a
time-series based object, like a zoo.


TIA and best,
-Tir

Tirthankar Patnaik
India Strategy
Citigroup Investment Research
+91-22-6631 9887

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



===========================================================================================DISCLAIMER
AND CONFIDENTIALITY CAUTION:\ \ This message and ...{{dropped}}

Patnaik, Tirthankar

2007-May-15 09:24 UTC

head link

[R] Conditional Sums for Index creation

All,
 Happy to say that the problem could be solved.  The key idea was from
Patrick Burns (Convert the data-frame to a matrix!).  As written
earlier, the steps were to first get a object (call it ad)  containing
the non-missing entries at each row. Then run a sum over each row,
selecting only those columns  that are pointed to in the object ad.
 
Another quick step was to use the fact that if there's an NA in any
column (after a stock has entered the portfolio), then we could use the
previous value just as well (e.g., the market cap of a stock). So
na.locf was a big help! 
 
Just providing the code here for illustration purposes:
 > set.seed(1)
> ab <- matrix(round(runif(100)*100),nrow=20,ncol=5)
> ab[1:5,4:5] <- NA
> ab[6:10,5] <- NA
> ac <- as.data.frame(ifelse(ab <= 7,NA,ab))
> ac   V1 V2 V3 V4 V5
1  27 93 82 NA NA
2  37 21 65 NA NA
3  57 65 78 NA NA
4  91 13 55 NA NA
5  20 27 53 NA NA
6  90 39 79 26 NA
7  94 NA NA 48 NA
8  66 38 48 77 NA
9  63 87 73  8 NA
10 NA 34 69 88 NA
11 21 48 48 34 24
12 18 60 86 84 NA
13 69 49 44 35 64
14 38 19 24 33 88
15 77 83 NA 48 78
16 50 67 10 89 80
17 72 79 32 86 46
18 99 11 52 39 41
19 38 72 66 78 81
20 78 41 41 96 60> 
> # -----------------------------------------------------
> # Indexes of all the non-missing in all the rows
> ad <- apply(ac,1,function(y)which(!is.na(y)))
> 
> af <- data.matrix(na.locf(ac,na.rm=FALSE))
> # Include another column as placeholder
> ag <- cbind(af,rep(1,nrow(af)))
> # Call it, "sumCorr".
> colnames(ag)[6] <- "sumCorr"
> ag[1,6] <- sum(ag[1,],na.rm=TRUE)
> for (r in 2:nrow(ag)){+ sumCorr <- unlist(sum(ag[r,unlist(ad[r-1])],na.rm=TRUE))
+ ag[r,6] <- sumCorr
+ }> ag   V1 V2 V3 V4 V5 sumCorr
1  27 93 82 NA NA     203
2  37 21 65 NA NA     123
3  57 65 78 NA NA     200
4  91 13 55 NA NA     159
5  20 27 53 NA NA     100
6  90 39 79 26 NA     208
7  94 39 79 48 NA     260
8  66 38 48 77 NA     143
9  63 87 73  8 NA     231
10 63 34 69 88 NA     254
11 21 48 48 34 24     130
12 18 60 86 84 24     272
13 69 49 44 35 64     197
14 38 19 24 33 88     202
15 77 83 24 48 78     310
16 50 67 10 89 80     286
17 72 79 32 86 46     315
18 99 11 52 39 41     242
19 38 72 66 78 81     335
20 78 41 41 96 60     316> 
Gaurav,
Anything to not implement a double for-loop! :)
With the implementation, I was able to generate my index at last.
Perhaps it's because I'm quite new with R, but I find it quite arcane
sometimes! :)
 
 
best,
-Tir
 
Tirthankar Patnaik
India Strategy
Citigroup Investment Research
+91-22-6631 9887

  _____  

From: gyadav@ccilindia.co.in [mailto:gyadav@ccilindia.co.in] 
Sent: Monday, May 14, 2007 6:29 PM
To: Patnaik, Tirthankar [GWM-CIR]
Cc: r-help@stat.math.ethz.ch; r-help-bounces@stat.math.ethz.ch
Subject: Re: [R] Conditional Sums for Index creation



Hi Tirthankar 

this will help you 

ind is a matrix which indicates the start of any new stock. 
ind[i,j] means that in j + 1 column all the values from 1st row to i - 1
row are all NAs. 


> x       V2 V3 V4 V5 V6 
 [1,] 27 93 82 NA NA 
 [2,] 37 21 65 NA NA 
 [3,] 57 65 78 NA NA 
 [4,] 91 13 55 NA NA 
 [5,] 20 27 53 NA NA 
 [6,] 90 39 79 26 NA 
 [7,] 94 NA NA 48 NA 
 [8,] 66 38 48 77 NA 
 [9,] 63 87 73  8 NA 
[10,] NA 34 69 88 NA 
[11,] 21 48 48 34 24 
[12,] 18 60 86 84 NA 
[13,] 69 49 44 35 64 
[14,] 38 19 24 33 88 
[15,] 77 83 NA 48 78 
[16,] 50 67 10 89 80 
[17,] 72 79 32 86 46 
[18,] 99 11 52 39 41 
[19,] 38 72 66 78 81 
[20,] 78 41 41 96 60 > 
> for ( j in 1:length(x[1,]) - 1) { + for ( i in 2:length(x[,1])) { 
+ indicator<-TRUE 
+ for (k in 1: i - 1){ 
+ indicator <- indicator && is.na(x[k,j+1]) 
+ } 
+ ind[i,j]<-indicator 
+ 
+ } 
+ } > ind       V2 V3 V4 V5 V6 
 [1,] NA NA NA NA NA 
 [2,]  0  0 NA NA  0 
 [3,]  0  0 NA NA  0 
 [4,]  0  0 NA NA  0 
 [5,]  0  0 NA NA  0 
 [6,]  0  0 NA NA  0 
 [7,]  0  0  0 NA  0 
 [8,]  0  0  0 NA  0 
 [9,]  0  0  0 NA  0 
[10,]  0  0  0 NA  0 
[11,]  0  0  0 NA  0 
[12,]  0  0  0  0  0 
[13,]  0  0  0  0  0 
[14,]  0  0  0  0  0 
[15,]  0  0  0  0  0 
[16,]  0  0  0  0  0 
[17,]  0  0  0  0  0 
[18,]  0  0  0  0  0 
[19,]  0  0  0  0  0 
[20,]  0  0  0  0  0 > 

Regards,

Gaurav Yadav
+++++++++++
Assistant Manager, CCIL, Mumbai (India)
Mob: +919821286118 Email: mailtogauravyadav@gmail.com
Bhagavad Gita:  Man is made by his Belief, as He believes, so He is 



"Patnaik, Tirthankar " <tirthankar.patnaik@citi.com> 
Sent by: r-help-bounces@stat.math.ethz.ch 

05/14/2007 11:53 AM 

To
<r-help@stat.math.ethz.ch> 
cc
Subject
[R] Conditional Sums for Index creation

	




Hi,
                Apologies for the long mail. I have a data.frame with
columns of
price/mcap data for a portfolio of stocks, and the date. To get the
total value of the portfolio on a daily basis, I calculate rowSums of
the data.frame. 
> set.seed(1)
> ab <- matrix(round(runif(100)*100),nrow=20,ncol=5)
> ab[1:5,4:5] <- NA
> ab[6:10,5] <- NA
> ac <- as.data.frame(ifelse(ab <= 7,NA,ab))
> ac  V1 V2 V3 V4 V5
1  27 93 82 NA NA
2  37 21 65 NA NA
3  57 65 78 NA NA
4  91 13 55 NA NA
5  20 27 53 NA NA
6  90 39 79 26 NA
7  94 NA NA 48 NA
8  66 38 48 77 NA
9  63 87 73  8 NA
10 NA 34 69 88 NA
11 21 48 48 34 24
12 18 60 86 84 NA
13 69 49 44 35 64
14 38 19 24 33 88
15 77 83 NA 48 78
16 50 67 10 89 80
17 72 79 32 86 46
18 99 11 52 39 41
19 38 72 66 78 81
20 78 41 41 96 60> 
Here the rows 1:20 are dates (also in my data.frame). 

Since some of the prices have NA, the rowSums is made to ignore these
entries. 
> rowSums(ac,na.rm=TRUE) 1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
19  20 
202 123 200 159 100 234 142 229 231 191 175 248 261 202 286 296 315 242
335 316 > 
Stocks are being added to the portfolio too. So from date=6 (or row=6)
we have the 4th stock V4, and from date=11, we have the 5th stock V5. My
problem is that I need to calculate the rowSums for row=6 (When a new
stock was added), _with_ and _without_ the new stock. So my answer for
row=6 would be 234 for the plain row-sum, and 234 - 26 = 208 for the
original set of stocks (without V4). Similarly, my answer for row=11
would be 175 for the plain sum, and 175 - 24 = 151 for the original sum
(without V5). 

Basically I'm interested in finding out the value of the portfolio with
and without the new stock for the purposes of creating an index. It's
possible that some stocks my get dropped later, in which case there
would be an NA series starting for say V1 at row=18 and so on. In that
case, the aim would be to find the sum at row=18 with and without the
value of V1. 

Is there any way I can get the sum over columns, deleting specific
colums? To get the columns that are NA in any row, I tried (shown for
the first 12 rows):
> apply(ac[1:12,],1,function(y)which(is.na(y)))
Which correctly gives 

$`1`
V4 V5 
4  5 

$`2`
V4 V5 
4  5 

$`3`
V4 V5 
4  5 

$`4`
V4 V5 
4  5 

$`5`
V4 V5 
4  5 

$`6`
V5 
5 

$`7`
V2 V3 V5 
2  3  5 

$`8`
V5 
5 

$`9`
V5 
5 

$`10`
V1 V5 
1  5 

$`11`
integer(0)

$`12`
V5 
5 
> 
But now I'm stuck. I don't how to use this list of indices at each row
to exclude my columns. 

Any pointers please? Would such an exercise be easier if I use a
time-series based object, like a zoo.


TIA and best,
-Tir

Tirthankar Patnaik
India Strategy
Citigroup Investment Research
+91-22-6631 9887

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


==========================================================================================DISCLAIMER
AND CONFIDENTIALITY CAUTION:\ \ This message and ...{{dropped}}

Patnaik, Tirthankar

2007-May-15 13:22 UTC

head link

[R] Conditional Sums for Index creation

Sending in plain text, as the html version doesn't seem to go through.. 


Best,
-Tir
________________________________

From: Patnaik, Tirthankar [GWM-CIR] 
Sent: Tuesday, May 15, 2007 2:55 PM
To: 'gyadav at ccilindia.co.in'
Cc: r-help at stat.math.ethz.ch; r-help-bounces at stat.math.ethz.ch
Subject: RE: [R] Conditional Sums for Index creation


All,
 Happy to say that the problem could be solved.  The key idea was from
Patrick Burns (Convert the data-frame to a matrix!).  As written
earlier, the steps were to first get a object (call it ad)  containing
the non-missing entries at each row. Then run a sum over each row,
selecting only those columns  that are pointed to in the object ad.
 
Another quick step was to use the fact that if there's an NA in any
column (after a stock has entered the portfolio), then we could use the
previous value just as well (e.g., the market cap of a stock). So
na.locf was a big help! 
 
Just providing the code here for illustration purposes:
 > set.seed(1)
> ab <- matrix(round(runif(100)*100),nrow=20,ncol=5)
> ab[1:5,4:5] <- NA
> ab[6:10,5] <- NA
> ac <- as.data.frame(ifelse(ab <= 7,NA,ab))
> ac   V1 V2 V3 V4 V5
1  27 93 82 NA NA
2  37 21 65 NA NA
3  57 65 78 NA NA
4  91 13 55 NA NA
5  20 27 53 NA NA
6  90 39 79 26 NA
7  94 NA NA 48 NA
8  66 38 48 77 NA
9  63 87 73  8 NA
10 NA 34 69 88 NA
11 21 48 48 34 24
12 18 60 86 84 NA
13 69 49 44 35 64
14 38 19 24 33 88
15 77 83 NA 48 78
16 50 67 10 89 80
17 72 79 32 86 46
18 99 11 52 39 41
19 38 72 66 78 81
20 78 41 41 96 60> 
> # -----------------------------------------------------
> # Indexes of all the non-missing in all the rows
> ad <- apply(ac,1,function(y)which(!is.na(y)))
> 
> af <- data.matrix(na.locf(ac,na.rm=FALSE))
> # Include another column as placeholder
> ag <- cbind(af,rep(1,nrow(af)))
> # Call it, "sumCorr".
> colnames(ag)[6] <- "sumCorr"
> ag[1,6] <- sum(ag[1,],na.rm=TRUE)
> for (r in 2:nrow(ag)){+ sumCorr <- unlist(sum(ag[r,unlist(ad[r-1])],na.rm=TRUE))
+ ag[r,6] <- sumCorr
+ }> ag   V1 V2 V3 V4 V5 sumCorr
1  27 93 82 NA NA     203
2  37 21 65 NA NA     123
3  57 65 78 NA NA     200
4  91 13 55 NA NA     159
5  20 27 53 NA NA     100
6  90 39 79 26 NA     208
7  94 39 79 48 NA     260
8  66 38 48 77 NA     143
9  63 87 73  8 NA     231
10 63 34 69 88 NA     254
11 21 48 48 34 24     130
12 18 60 86 84 24     272
13 69 49 44 35 64     197
14 38 19 24 33 88     202
15 77 83 24 48 78     310
16 50 67 10 89 80     286
17 72 79 32 86 46     315
18 99 11 52 39 41     242
19 38 72 66 78 81     335
20 78 41 41 96 60     316> 
Gaurav,
Anything to not implement a double for-loop! :)
With the implementation, I was able to generate my index at last.
Perhaps it's because I'm quite new with R, but I find it quite arcane
sometimes! :)
 
 
best,
-Tir
 
Tirthankar Patnaik
India Strategy
Citigroup Investment Research
+91-22-6631 9887

________________________________

From: gyadav at ccilindia.co.in [mailto:gyadav at ccilindia.co.in] 
Sent: Monday, May 14, 2007 6:29 PM
To: Patnaik, Tirthankar [GWM-CIR]
Cc: r-help at stat.math.ethz.ch; r-help-bounces at stat.math.ethz.ch
Subject: Re: [R] Conditional Sums for Index creation



Hi Tirthankar 

this will help you 

ind is a matrix which indicates the start of any new stock. 
ind[i,j] means that in j + 1 column all the values from 1st row to i - 1
row are all NAs. 


> x       V2 V3 V4 V5 V6 
 [1,] 27 93 82 NA NA 
 [2,] 37 21 65 NA NA 
 [3,] 57 65 78 NA NA 
 [4,] 91 13 55 NA NA 
 [5,] 20 27 53 NA NA 
 [6,] 90 39 79 26 NA 
 [7,] 94 NA NA 48 NA 
 [8,] 66 38 48 77 NA 
 [9,] 63 87 73  8 NA 
[10,] NA 34 69 88 NA 
[11,] 21 48 48 34 24 
[12,] 18 60 86 84 NA 
[13,] 69 49 44 35 64 
[14,] 38 19 24 33 88 
[15,] 77 83 NA 48 78 
[16,] 50 67 10 89 80 
[17,] 72 79 32 86 46 
[18,] 99 11 52 39 41 
[19,] 38 72 66 78 81 
[20,] 78 41 41 96 60 > 
> for ( j in 1:length(x[1,]) - 1) { + for ( i in 2:length(x[,1])) { 
+ indicator<-TRUE 
+ for (k in 1: i - 1){ 
+ indicator <- indicator && is.na(x[k,j+1]) 
+ } 
+ ind[i,j]<-indicator 
+ 
+ } 
+ } > ind       V2 V3 V4 V5 V6 
 [1,] NA NA NA NA NA 
 [2,]  0  0 NA NA  0 
 [3,]  0  0 NA NA  0 
 [4,]  0  0 NA NA  0 
 [5,]  0  0 NA NA  0 
 [6,]  0  0 NA NA  0 
 [7,]  0  0  0 NA  0 
 [8,]  0  0  0 NA  0 
 [9,]  0  0  0 NA  0 
[10,]  0  0  0 NA  0 
[11,]  0  0  0 NA  0 
[12,]  0  0  0  0  0 
[13,]  0  0  0  0  0 
[14,]  0  0  0  0  0 
[15,]  0  0  0  0  0 
[16,]  0  0  0  0  0 
[17,]  0  0  0  0  0 
[18,]  0  0  0  0  0 
[19,]  0  0  0  0  0 
[20,]  0  0  0  0  0 > 

Regards,

Gaurav Yadav
+++++++++++
Assistant Manager, CCIL, Mumbai (India)
Mob: +919821286118 Email: mailtogauravyadav at gmail.com
Bhagavad Gita:  Man is made by his Belief, as He believes, so He is 



"Patnaik, Tirthankar " <tirthankar.patnaik at citi.com> 
Sent by: r-help-bounces at stat.math.ethz.ch 

05/14/2007 11:53 AM 

	
To
	<r-help at stat.math.ethz.ch> 
cc
	
Subject
	[R] Conditional Sums for Index creation	

		




Hi,
                Apologies for the long mail. I have a data.frame with
columns of
price/mcap data for a portfolio of stocks, and the date. To get the
total value of the portfolio on a daily basis, I calculate rowSums of
the data.frame. 
> set.seed(1)
> ab <- matrix(round(runif(100)*100),nrow=20,ncol=5)
> ab[1:5,4:5] <- NA
> ab[6:10,5] <- NA
> ac <- as.data.frame(ifelse(ab <= 7,NA,ab))
> ac  V1 V2 V3 V4 V5
1  27 93 82 NA NA
2  37 21 65 NA NA
3  57 65 78 NA NA
4  91 13 55 NA NA
5  20 27 53 NA NA
6  90 39 79 26 NA
7  94 NA NA 48 NA
8  66 38 48 77 NA
9  63 87 73  8 NA
10 NA 34 69 88 NA
11 21 48 48 34 24
12 18 60 86 84 NA
13 69 49 44 35 64
14 38 19 24 33 88
15 77 83 NA 48 78
16 50 67 10 89 80
17 72 79 32 86 46
18 99 11 52 39 41
19 38 72 66 78 81
20 78 41 41 96 60> 
Here the rows 1:20 are dates (also in my data.frame). 

Since some of the prices have NA, the rowSums is made to ignore these
entries. 
> rowSums(ac,na.rm=TRUE) 1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
19  20 
202 123 200 159 100 234 142 229 231 191 175 248 261 202 286 296 315 242
335 316 > 
Stocks are being added to the portfolio too. So from date=6 (or row=6)
we have the 4th stock V4, and from date=11, we have the 5th stock V5. My
problem is that I need to calculate the rowSums for row=6 (When a new
stock was added), _with_ and _without_ the new stock. So my answer for
row=6 would be 234 for the plain row-sum, and 234 - 26 = 208 for the
original set of stocks (without V4). Similarly, my answer for row=11
would be 175 for the plain sum, and 175 - 24 = 151 for the original sum
(without V5). 

Basically I'm interested in finding out the value of the portfolio with
and without the new stock for the purposes of creating an index. It's
possible that some stocks my get dropped later, in which case there
would be an NA series starting for say V1 at row=18 and so on. In that
case, the aim would be to find the sum at row=18 with and without the
value of V1. 

Is there any way I can get the sum over columns, deleting specific
colums? To get the columns that are NA in any row, I tried (shown for
the first 12 rows):
> apply(ac[1:12,],1,function(y)which(is.na(y)))
Which correctly gives 

$`1`
V4 V5 
4  5 

$`2`
V4 V5 
4  5 

$`3`
V4 V5 
4  5 

$`4`
V4 V5 
4  5 

$`5`
V4 V5 
4  5 

$`6`
V5 
5 

$`7`
V2 V3 V5 
2  3  5 

$`8`
V5 
5 

$`9`
V5 
5 

$`10`
V1 V5 
1  5 

$`11`
integer(0)

$`12`
V5 
5 
> 
But now I'm stuck. I don't how to use this list of indices at each row
to exclude my columns. 

Any pointers please? Would such an exercise be easier if I use a
time-series based object, like a zoo.


TIA and best,
-Tir

Tirthankar Patnaik
India Strategy
Citigroup Investment Research
+91-22-6631 9887

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


==========================================================================================DISCLAIMER
AND CONFIDENTIALITY CAUTION:\ \ This message and ...{{dropped}}

Possibly Parallel Threads

Search for more reasonably related threads

R help - May 2007 - Conditional Sums for Index creation

[R] Conditional Sums for Index creation

[R] Conditional Sums for Index creation

[R] Odp: Conditional Sums for Index creation

[R] Conditional Sums for Index creation

[R] Conditional Sums for Index creation

[R] Conditional Sums for Index creation

[R] Conditional Sums for Index creation

Possibly Parallel Threads