Dear R helpers, Thanks a lot for your earlier guidance esp. Mr Davind Winsemius Sir. However, there seems to be mis-communication from my end corresponding to my requirement. As I had mentioned in my earlier mail, I am dealing with a very large database of borrowers and I had given a part of it in my earlier mail as given below. For a given rating say "A", I needed to have the bad-wise sums of ead's (where bands are constructed using the ead size itself.) and not the number of borrowers falling in a particular band. I am reproducing the data and solution as provided by Winsemius Sir (which generates the number of band-wise borrowers for a given rating. rating <- c("A", "AAA", "A", "BBB","AA","A","BB", "BBB", "AA", "AA", "AA", "A", "A", "AA","BB","BBB","AA", "A", "AAA","BBB","BBB", "BB", "A", "BB", "A", "AA", "B","A", "AA", "BBB", "A", "BBB") ead <- c(169229.93,100, 5877794.25, 9530148.63, 75040962.06, 21000, 1028360, 6000000, 17715000, 14430325.24, 1180946.57, 150000, 167490, 81255.16, 54812.5, 3000, 1275702.94, 9100, 1763142.3, 3283048.61, 1200000, 11800, 3000, 96894.02, 453671.72, 7590, 106065.24, 940711.67, 2443000, 9500000, 39000, 1501939.67) df$ead.cat <- cut(df$ead, breaks=c(0, 100000, 500000, 1000000, 2000000, 5000000 , 10000000, 100000000) ) df df_sorted <- df[order(df$rating),] # the output is as given below.> df_sortedrating ead ead.cat 1 A 169229.93 (1e+05,5e+05] 3 A 5877794.25 (5e+06,1e+07] 6 A 21000.00 (0,1e+05] 12 A 150000.00 (1e+05,5e+05] 13 A 167490.00 (1e+05,5e+05] 18 A 9100.00 (0,1e+05] 23 A 3000.00 (0,1e+05] 25 A 453671.72 (1e+05,5e+05] 28 A 940711.67 (5e+05,1e+06] 31 A 39000.00 (0,1e+05] 5 AA 75040962.06 (1e+07,1e+08] 9 AA 17715000.00 (1e+07,1e+08] 10 AA 14430325.24 (1e+07,1e+08] 11 AA 1180946.57 (1e+06,2e+06] 14 AA 81255.16 (0,1e+05] 17 AA 1275702.94 (1e+06,2e+06] 26 AA 7590.00 (0,1e+05] 29 AA 2443000.00 (2e+06,5e+06] 2 AAA 100.00 (0,1e+05] 19 AAA 1763142.30 (1e+06,2e+06] 27 B 106065.24 (1e+05,5e+05] 7 BB 1028360.00 (1e+06,2e+06] 15 BB 54812.50 (0,1e+05] 22 BB 11800.00 (0,1e+05] 24 BB 96894.02 (0,1e+05] 4 BBB 9530148.63 (5e+06,1e+07] 8 BBB 6000000.00 (5e+06,1e+07] 16 BBB 3000.00 (0,1e+05] 20 BBB 3283048.61 (2e+06,5e+06] 21 BBB 1200000.00 (1e+06,2e+06] 30 BBB 9500000.00 (5e+06,1e+07] 32 BBB 1501939.67 (1e+06,2e+06] ## The following command fetches rating-wise and ead size no of borrowers. Thus, for rating A, there are 4 borrowers in the ead range (0, 1e+05], 4 borrowers in the range (1e+05 to 5e+05] and so on......> with(df, tapply(ead.cat, rating, table))$A (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] (1e+07,1e+08] 4 4 1 0 0 1 0 $AA (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] (1e+07,1e+08] 2 0 0 2 1 0 3 $AAA (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] (1e+07,1e+08] 1 0 0 1 0 0 0 $B (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] (1e+07,1e+08] 0 1 0 0 0 0 0 $BB (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] (1e+07,1e+08] 3 0 0 1 0 0 0 $BBB (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] (1e+07,1e+08] 1 0 0 2 1 3 0 #### My ACTUAL REQUIREMENT Actually for a given rating, I don't want the number of borrowers falling in each of the ead_range. What I want is sum of eads falling in each range. Thus, say for rating "A", I need following. rating ead.cat ead_total 1 A (0,1e+05] 72100.00 # (21000+9100+3000+39000) 2 A (1e+05, 5e+05] 940391.65 #(169229.93+150000.00+167490.00+453671.72) and so on. I am extremely sorry for any mis-communication in my earlier mail. I could test the reply sent to me earlier by Winsemius Sir only today as I was traveling over weekends. Also, I have tried to go through earlier emails dealing with such conditional sums. Unfortunately, I couldn't understand as I have recently started my venture with R. Thanking you in advance and sincerely apologize for any mis-communication if it had occurred in my earlier mail. Regards Vincy --- On Fri, 8/27/10, David Winsemius <dwinsemius@comcast.net> wrote: From: David Winsemius <dwinsemius@comcast.net> Subject: Re: [R] Band-wise Sum To: "Vincy Pyne" <vincy_pyne@yahoo.ca> Cc: r-help@r-project.org Received: Friday, August 27, 2010, 2:36 PM On Aug 27, 2010, at 9:49 AM, Vincy Pyne wrote:> Hi > > I have a large credit portfolio (exceeding 50000 borrowers). For particular process I need to add up the exposures based on the bands. I am giving a small test data below.I would think that cut() would be the accepted method for defining a factor variable based on specified cutpoints. If you then wanted to see what the cumsum() was across the range of possible levels, that to would be a fairly simple task. df$ead.cat <- cut(df$ead, breaks=c(0, 100000, 500000, 1000000, 2000000, 5000000 , 10000000, 100000000) ) df with(df, tapply(ead.cat, rating, length)) # A AA AAA B BB BBB # 10 8 2 1 4 7 with(df, tapply(ead.cat, rating, table)) # returns a list of table objects by bond rating lapply( with(df, tapply(ead.cat, rating, table)) , cumsum) #returns the cumsum of those tables # sapply gives a more compact output of that result: sapply( with(df, tapply(ead.cat, rating, table)) , cumsum) A AA AAA B BB BBB (0,1e+05] 4 2 1 0 3 1 (1e+05,5e+05] 8 2 1 1 3 1 (5e+05,1e+06] 9 2 1 1 3 1 (1e+06,2e+06] 9 4 2 1 4 3 (2e+06,5e+06] 9 5 2 1 4 4 (5e+06,1e+07] 10 5 2 1 4 7 (1e+07,1e+08] 10 8 2 1 4 7 Loops, you say we need loops? We don't need no stinkin' loops. --David.> > rating <- c("A", "AAA", "A", "BBB","AA","A","BB", "BBB", "AA", "AA", "AA", "A", "A", "AA","BB","BBB","AA", "A", "AAA","BBB","BBB", "BB", "A", "BB", "A", "AA", "B","A", "AA", "BBB", "A", "BBB") > > ead <- c(169229.93,100, 5877794.25, 9530148.63, 75040962.06, 21000, 1028360, 6000000, 17715000, 14430325.24, 1180946.57, 150000,167490, 81255.16, 54812.5, 3000, 1275702.94, 9100, 1763142.3, 3283048.61, 1200000, 11800, 3000, 96894.02, 453671.72, 7590, 106065.24, 940711.67, 2443000, 9500000, 39000, 1501939.67)> > ## First I have sorted the data rating-wise as > > df <- data.frame(rating, ead) > > df_sorted <- > df[order(df$rating),] > > df_sorted_AAA <- subset(df_sorted, rating=="AAA") > df_sorted_AA <- subset(df_sorted, rating=="AA") > df_sorted_A <- subset(df_sorted, rating=="A") > df_sorted_BBB <- subset(df_sorted, rating=="BBB") > df_sorted_BB <- subset(df_sorted, rating=="BB") > df_sorted_B <- subset(df_sorted, rating=="B") > df_sorted_CCC <- subset(df_sorted, rating=="CCC") > > ## we begin with BBB rating. The R output for df_sorted_BBB is as follows > >> df_sorted_BBB >rating ead> 4 BBB 9530149 > 8 BBB 6000000 > 16 BBB 3000 > 20 BBB 3283049 > 21 BBB 1200000 > 30 BBB 9500000 > 32 BBB 1501940 > > My problem is I need to totals of eads falling in the respective bands > > I > am defining bands in millions as > > seq_BBB <- seq(1000000, max(df_sorted_BBB$ead), by = 1000000) > > # The output is > [1] 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 7e+06 8e+06 9e+06 > > So for the sub data pertaining to Rating "BBB", I want corresponding ead totals i.e. I want ead totals where ead < 1e+06, then I want eadtotals where 1+e06 < ead < 2e+06, 2e+06 < ead < 3e+06 ...and so on.> > I have tried the following code > > s_BBB <- NULL > > for (i in 1:length(s_BBB)) > { > s_BBB[i] = sum(subset(df_sorted_BBB$ead, df_sorted_BBB$ead < s_BBB[i])) > } > > I was trying to find totals ofads < 1e+06, ead < 2e+06, ead<3e+06and so on. > > but the result is > >> s_BBB > [1] 0 > > > I apologize if I am not able to express my problem properly. My only objective is first to sort the whole portfolio rating-wise and then within each of these rating-wise sorted data, I wish to find out total of eads based > on various bands starting <1000000, 1000000 - 200000, 2000000 - 3000000, 3000000 - 4000000 and so on. Since the database contains more than 50000 records, various ead amounts ranging from few 000's to billion areavailable.> > Please guide > > Thanking you all in advance > > Vincy > > > > > > > > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]]
On Aug 30, 2010, at 4:05 AM, Vincy Pyne wrote:> Dear R helpers, > > Thanks a lot for your earlier guidance esp. Mr Davind Winsemius Sir. > However, there seems to be mis-communication from my end > corresponding to my requirement. As I had mentioned in my earlier > mail, I am dealing with a very large database of borrowers and I had > given a part of it in my earlier mail as given below. For a given > rating say "A", I needed to have the bad-wise sums of ead's (where > bands are constructed using the ead size itself.) and not the number > of borrowers falling in a particular band. > > I am reproducing the data and solution as provided by Winsemius Sir > (which generates the number of band-wise borrowers for a given rating. > > rating <- c("A", "AAA", "A", "BBB","AA","A","BB", "BBB", "AA", "AA", > "AA", "A", "A", "AA","BB","BBB","AA", "A", "AAA","BBB","BBB", "BB", > "A", "BB", "A", "AA", "B","A", "AA", "BBB", "A", "BBB") > > ead <- c(169229.93,100, 5877794.25, 9530148.63, 75040962.06, 21000, > 1028360, 6000000, 17715000, 14430325.24, 1180946.57, 150000, > 167490, 81255.16, 54812.5, 3000, 1275702.94, 9100, 1763142.3, > 3283048.61, 1200000, 11800, 3000, 96894.02, 453671.72, 7590, > 106065.24, 940711.67, 2443000, 9500000, 39000, 1501939.67) > > df$ead.cat <- cut(df$ead, breaks=c(0, 100000, 500000, 1000000, > 2000000, 5000000 , 10000000, 100000000) ) > > df > > df_sorted <- df[order(df$rating),] # the output is as given > below. > > > df_sorted > rating ead ead.cat > 1 A 169229.93 (1e+05,5e+05] > 3 A 5877794.25 (5e+06,1e+07] > 6 A 21000.00 (0,1e+05] > 12 A 150000.00 (1e+05,5e+05] > 13 A 167490.00 (1e+05,5e+05] > 18 A 9100.00 (0,1e+05] > 23 A 3000.00 (0,1e+05] > 25 A 453671.72 (1e+05,5e+05] > 28 A 940711.67 (5e+05,1e+06] > 31 A 39000.00 (0,1e+05] > 5 AA 75040962.06 (1e+07,1e+08] > 9 AA 17715000.00 (1e+07,1e+08] > 10 AA 14430325.24 (1e+07,1e+08] > 11 AA 1180946.57 (1e+06,2e+06] > 14 AA 81255.16 (0,1e+05] > 17 AA 1275702.94 (1e+06,2e+06] > 26 AA 7590.00 (0,1e+05] > 29 AA 2443000.00 (2e+06,5e+06] > 2 AAA 100.00 (0,1e+05] > 19 AAA 1763142.30 (1e+06,2e+06] > 27 B 106065.24 (1e+05,5e+05] > 7 BB 1028360.00 (1e+06,2e+06] > 15 BB 54812.50 (0,1e+05] > 22 BB 11800.00 (0,1e+05] > 24 BB 96894.02 (0,1e+05] > 4 BBB 9530148.63 (5e+06,1e+07] > 8 BBB 6000000.00 (5e+06,1e+07] > 16 BBB 3000.00 (0,1e+05] > 20 BBB 3283048.61 (2e+06,5e+06] > 21 BBB 1200000.00 (1e+06,2e+06] > 30 BBB 9500000.00 (5e+06,1e+07] > 32 BBB 1501939.67 (1e+06,2e+06] > > > ## The following command fetches rating-wise and ead size no of > borrowers. Thus, for rating A, there are 4 borrowers in the ead > range (0, 1e+05], 4 borrowers in the range (1e+05 to 5e+05] and so > on...... > > > with(df, tapply(ead.cat, rating, table)) > $A > > (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e > +06] (5e+06,1e+07] (1e+07,1e+08] > 4 4 1 0 > 0 1 0 > > $AA > > (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e > +06] (5e+06,1e+07] (1e+07,1e+08] > 2 0 0 2 > 1 0 3 > > $AAA > > (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e > +06] (5e+06,1e+07] (1e+07,1e+08] > 1 0 0 1 > 0 0 0 > > $B > > (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e > +06] (5e+06,1e+07] (1e+07,1e+08] > 0 1 0 0 > 0 0 0 > > $BB > > (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e > +06] (5e+06,1e+07] (1e+07,1e+08] > 3 0 0 1 > 0 0 0 > > $BBB > > (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e > +06] (5e+06,1e+07] (1e+07,1e+08] > 1 0 0 2 > 1 3 0 > > > #### My ACTUAL REQUIREMENT > > Actually for a given rating, I don't want the number of borrowers > falling in each of the ead_range. What I want is sum of eads falling > in each range. Thus, say for rating "A", I need following. > > > rating ead.cat ead_total > 1 A (0,1e+05] 72100.00 # > (21000+9100+3000+39000) > 2 A (1e+05, 5e+05] 940391.65 > > #(169229.93+150000.00+167490.00+453671.72)So you just wanted simple sums within rating and ead.cat: with(df_sorted, tapply(ead, list(rating,ead.cat), sum, na.rm=TRUE)) (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] A 72100.00 940391.6 940711.7 NA NA 5877794 AA 88845.16 NA NA 2456650 2443000 NA AAA 100.00 NA NA 1763142 NA NA B NA 106065.2 NA NA NA NA BB 163506.52 NA NA 1028360 NA NA BBB 3000.00 NA NA 2701940 3283049 25030149 (1e+07,1e+08] A NA AA 107186287 AAA NA B NA BB NA BBB NA -- David.> and so on. > > I am extremely sorry for any mis-communication in my earlier mail. I > could test the reply sent to me earlier by Winsemius Sir only today > as I was traveling over weekends. Also, I have tried to go through > earlier emails dealing with such conditional sums. Unfortunately, I > couldn't understand as I have recently started my venture with R. > > > Thanking you in advance and sincerely apologize for any mis- > communication if it had occurred in my earlier mail. > > Regards > > Vincy > > > --- On Fri, 8/27/10, David Winsemius <dwinsemius at comcast.net> wrote: > > From: David Winsemius <dwinsemius at comcast.net> > Subject: Re: [R] Band-wise Sum > To: "Vincy Pyne" <vincy_pyne at yahoo.ca> > Cc: r-help at r-project.org > Received: Friday, August 27, 2010, 2:36 PM > > > On Aug 27, 2010, at 9:49 AM, Vincy Pyne wrote: > > > Hi > > > > I have a large credit portfolio (exceeding 50000 borrowers). For > particular process I need to add up the exposures based on the > bands. I am giving a small test data below. > > I would think that cut() would be the accepted method for defining a > factor variable based on specified cutpoints. If you then wanted to > see what the cumsum() was across the range of possible levels, that > to would be a fairly simple task. > > df$ead.cat <- cut(df$ead, breaks=c(0, 100000, 500000, 1000000, > 2000000, 5000000 , 10000000, 100000000) ) > df > with(df, tapply(ead.cat, rating, length)) > # A AA AAA B BB BBB > # 10 8 2 1 4 7 > with(df, tapply(ead.cat, rating, table)) > # returns a list of table objects by bond rating > > lapply( with(df, tapply(ead.cat, rating, table)) , cumsum) > #returns the cumsum of those tables > > # sapply gives a more compact output of that result: > sapply( with(df, tapply(ead.cat, rating, table)) , cumsum) > A AA AAA B BB BBB > (0,1e+05] 4 2 1 0 3 1 > (1e+05,5e+05] 8 2 1 1 3 1 > (5e+05,1e+06] 9 2 1 1 3 1 > (1e+06,2e+06] 9 4 2 1 4 3 > (2e+06,5e+06] 9 5 2 1 4 4 > (5e+06,1e+07] 10 5 2 1 4 7 > (1e+07,1e+08] 10 8 2 1 4 7 > > Loops, you say we need loops? We don't need no stinkin' loops. > > --David. > > > > > rating <- c("A", "AAA", "A", "BBB","AA","A","BB", "BBB", "AA", > "AA", "AA", "A", "A", "AA","BB","BBB","AA", "A", "AAA","BBB","BBB", > "BB", "A", "BB", "A", "AA", "B","A", "AA", "BBB", "A", "BBB") > > > > ead <- c(169229.93,100, 5877794.25, 9530148.63, 75040962.06, > 21000, 1028360, 6000000, 17715000, 14430325.24, 1180946.57, > 150000, 167490, 81255.16, 54812.5, 3000, 1275702.94, 9100, > 1763142.3, 3283048.61, 1200000, 11800, 3000, 96894.02, 453671.72, > 7590, 106065.24, 940711.67, 2443000, 9500000, 39000, 1501939.67) > > > > ## First I have sorted the data rating-wise as > > > > df <- data.frame(rating, ead) > > > > df_sorted <- > > df[order(df$rating),] > > > > df_sorted_AAA <- subset(df_sorted, rating=="AAA") > > df_sorted_AA <- subset(df_sorted, rating=="AA") > > df_sorted_A <- subset(df_sorted, rating=="A") > > df_sorted_BBB <- subset(df_sorted, rating=="BBB") > > df_sorted_BB <- subset(df_sorted, rating=="BB") > > df_sorted_B <- subset(df_sorted, rating=="B") > > df_sorted_CCC <- subset(df_sorted, rating=="CCC") > > > > ## we begin with BBB rating. The R output for df_sorted_BBB is as > follows > > > >> df_sorted_BBB > > rating ead > > 4 BBB 9530149 > > 8 BBB 6000000 > > 16 BBB 3000 > > 20 BBB 3283049 > > 21 BBB 1200000 > > 30 BBB 9500000 > > 32 BBB 1501940 > > > > My problem is I need to totals of eads falling in the respective > bands > > > > I > > am defining bands in millions as > > > > seq_BBB <- seq(1000000, max(df_sorted_BBB$ead), by = 1000000) > > > > # The output is > > [1] 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 7e+06 8e+06 9e+06 > > > > So for the sub data pertaining to Rating "BBB", I want > corresponding ead totals i.e. I want ead totals where ead < 1e+06, > then I want ead totals where 1+e06 < ead < 2e+06, 2e+06 < ead < 3e > +06 ...and so on. > > > > I have tried the following code > > > > s_BBB <- NULL > > > > for (i in 1:length(s_BBB)) > > { > > s_BBB[i] = sum(subset(df_sorted_BBB$ead, df_sorted_BBB$ead < > s_BBB[i])) > > } > > > > I was trying to find totals ofads < 1e+06, ead < 2e+06, ead<3e > +06and so on. > > > > but the result is > > > >> s_BBB > > [1] 0 > > > > > > I apologize if I am not able to express my problem properly. My > only objective is first to sort the whole portfolio rating-wise and > then within each of these rating-wise sorted data, I wish to find > out total of eads based > > on various bands starting <1000000, 1000000 - 200000, 2000000 - > 3000000, 3000000 - 4000000 and so on. Since the database contains > more than 50000 records, various ead amounts ranging from few 000's > to billion are available. > > > > Please guide > > > > Thanking you all in advance>David Winsemius, MD West Hartford, CT
Dear David and Dennis Sir, Thanks a lot for your guidance. As guided by Mr Dennis Murphy Sir in his reply .... Replace table in the tapply call with sum. While you're at it, typing ?tapply to find out what the function does wouldn't hurt... I had really tried earlier to understand the apply, tapply, mapply and sapply commands before writing back to the R forum. But I was not able to figure out where was the problem. But Mr Dennis Sir really inspired me and when I revisited 'tapply', I realized that instead of using 'ead' for getting sum, I was using 'ead.cat', and that solved my problem. Then I had a new problem of 'How to get rid of NA's' , Again instead of posting to the group, I had accessed the earlier R mails and in the end got the solution. I sincerely thank both of you for taking so much efforts and guiding me. I will certainly take efforts to understand 'R' at the earliest. Regards Vincy ________________________________________________________ Replace table in the tapply call with sum. While you're at it, typing ?tapply to find out what the function does wouldn't hurt... HTH, Dennis --- On Mon, 8/30/10, David Winsemius <dwinsemius@comcast.net> wrote: From: David Winsemius <dwinsemius@comcast.net> Subject: Re: [R] Band-wise Conditional Sum - Actual problem Cc: r-help@r-project.org Received: Monday, August 30, 2010, 2:43 PM On Aug 30, 2010, at 4:05 AM, Vincy Pyne wrote:> Dear R helpers, > > Thanks a lot for your earlier guidance esp. Mr Davind Winsemius Sir. However, there seems to be mis-communication from my endcorresponding to my requirement. As I had mentioned in my earlier mail, I am dealing with a very large database of borrowers and I had given a part of it in my earlier mail as given below. For a given rating say "A", I needed to have the bad-wise sums of ead's (where bands are constructed using the ead size itself.) and not the number of borrowers falling in a particular band.> > I am reproducing the data and solution as provided by Winsemius Sir (which generates the number of band-wise borrowers for a given rating. > > rating <- c("A", "AAA", "A", "BBB","AA","A","BB", "BBB", "AA", "AA", "AA", "A", "A", "AA","BB","BBB","AA", "A", "AAA","BBB","BBB", "BB", "A", "BB", "A", "AA", "B","A", "AA", "BBB", "A", "BBB") > > ead <- c(169229.93,100, 5877794.25, 9530148.63, 75040962.06, 21000, 1028360, 6000000, 17715000, 14430325.24, 1180946.57, 150000, 167490, 81255.16, 54812.5, 3000, 1275702.94, 9100,1763142.3, 3283048.61, 1200000, 11800, 3000, 96894.02, 453671.72, 7590, 106065.24, 940711.67, 2443000, 9500000, 39000, 1501939.67)> > df$ead.cat <- cut(df$ead, breaks=c(0, 100000, 500000, 1000000, 2000000, 5000000 , 10000000, 100000000) ) > > df > > df_sorted <- df[order(df$rating),] # the output is as given below. > > > df_sorted > rating ead ead.cat > 1 A 169229.93 (1e+05,5e+05] > 3 A 5877794.25 (5e+06,1e+07] > 6 A 21000.00(0,1e+05]> 12 A 150000.00 (1e+05,5e+05] > 13 A 167490.00 (1e+05,5e+05] > 18 A 9100.00 (0,1e+05] > 23 A 3000.00 (0,1e+05] > 25 A 453671.72 (1e+05,5e+05] > 28 A 940711.67 (5e+05,1e+06] > 31 A 39000.00(0,1e+05]> 5 AA 75040962.06 (1e+07,1e+08] > 9 AA 17715000.00 (1e+07,1e+08] > 10 AA 14430325.24 (1e+07,1e+08] > 11 AA 1180946.57 (1e+06,2e+06] > 14 AA 81255.16 (0,1e+05] > 17 AA 1275702.94 (1e+06,2e+06] > 26 AA 7590.00 (0,1e+05] > 29 AA2443000.00 (2e+06,5e+06]> 2 AAA 100.00 (0,1e+05] > 19 AAA 1763142.30 (1e+06,2e+06] > 27 B 106065.24 (1e+05,5e+05] > 7 BB 1028360.00 (1e+06,2e+06] > 15 BB 54812.50 (0,1e+05] > 22 BB 11800.00 (0,1e+05] > 24 BB96894.02 (0,1e+05]> 4 BBB 9530148.63 (5e+06,1e+07] > 8 BBB 6000000.00 (5e+06,1e+07] > 16 BBB 3000.00 (0,1e+05] > 20 BBB 3283048.61 (2e+06,5e+06] > 21 BBB 1200000.00 (1e+06,2e+06] > 30 BBB 9500000.00 (5e+06,1e+07] > 32 BBB 1501939.67 (1e+06,2e+06] > > > ## The following command fetches rating-wise and eadsize no of borrowers. Thus, for rating A, there are 4 borrowers in the ead range (0, 1e+05], 4 borrowers in the range (1e+05 to 5e+05] and so on......> > > with(df, tapply(ead.cat, rating, table)) > $A > > (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] (1e+07,1e+08] > 4 4 1 0 0 1 0 > > $AA > > (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] (1e+07,1e+08] >2 0 0 2 1 0 3> > $AAA > > (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] (1e+07,1e+08] > 1 0 0 1 0 0 0 > > $B > > (0,1e+05](1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] (1e+07,1e+08]> 0 1 0 0 0 0 0 > > $BB > > (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] (1e+07,1e+08] > 3 0 0 1 0 00> > $BBB > > (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] (1e+07,1e+08] > 1 0 0 2 1 3 0 > > > #### My ACTUAL REQUIREMENT > > Actually for a given rating, I don't want the number of borrowers falling in each of the ead_range. What I want is sum of eads falling in each range. Thus, say for rating "A", I need following. > > > rating ead.catead_total> 1 A (0,1e+05] 72100.00 # (21000+9100+3000+39000) > 2 A (1e+05, 5e+05] 940391.65 > > #(169229.93+150000.00+167490.00+453671.72)So you just wanted simple sums within rating and ead.cat: with(df_sorted, tapply(ead, list(rating,ead.cat), sum, na.rm=TRUE)) (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] (5e+06,1e+07] A 72100.00 940391.6 940711.7 NA NA 5877794 AA 88845.16 NA NA 2456650 2443000 NA AAA 100.00 NA NA 1763142 NA NA B NA 106065.2 NA NA NA NA BB 163506.52 NA NA 1028360 NA NA BBB 3000.00 NA NA 2701940 3283049 25030149 (1e+07,1e+08] A NA AA 107186287 AAA NA B NA BB NA BBB NA --David.> and so on. > > I am extremely sorry for any mis-communication in my earlier mail. I could test the reply sent to me earlier by Winsemius Sir only today as I was traveling over weekends. Also, I have tried to go through earlier emails dealing with such conditional sums. Unfortunately, I couldn't understand as I haverecently started my venture with R.> > > Thanking you in advance and sincerely apologize for any mis-communication if it had occurred in my earlier mail. > > Regards > > Vincy > > > --- On Fri, 8/27/10, David Winsemius <dwinsemius@comcast.net> wrote: > > From: David Winsemius <dwinsemius@comcast.net> > Subject: Re: [R] Band-wise Sum> Cc: r-help@r-project.org > Received: Friday, August 27, 2010, 2:36 PM > > > On Aug 27,2010, at 9:49 AM, Vincy Pyne wrote:> > > Hi > > > > I have a large credit portfolio (exceeding 50000 borrowers). For particular process I need to add up the exposures based on the bands. I am giving a small test data below. > > I would think that cut() would be the accepted method for defining a factor variable based on specified cutpoints. If you then wanted to see what the cumsum() was across the range of possible levels, that to would be a fairly simple task. > > df$ead.cat <- cut(df$ead, breaks=c(0, 100000, 500000, 1000000, 2000000, 5000000 , 10000000, 100000000) ) > df > with(df, tapply(ead.cat, rating, length)) > # A AA AAA B BB BBB > # 10 8 2 1 4 7 > with(df, tapply(ead.cat, rating, table)) > # returns a list of table objects bybond rating> > lapply( with(df, tapply(ead.cat, rating, table)) , cumsum) > #returns the cumsum of those tables > > # sapply gives a more compact output of that result: > sapply( with(df, tapply(ead.cat, rating, table)) , cumsum) > A AA AAA B BB BBB > (0,1e+05] 4 2 1 0 3 1 > (1e+05,5e+05] 8 2 1 1 3 1 > (5e+05,1e+06] 9 2 1 1 3 1 > (1e+06,2e+06] 9 4 2 1 4 3 > (2e+06,5e+06] 9 5 2 1 4 4 > (5e+06,1e+07] 10 5 2 1 4 7 > (1e+07,1e+08] 10 8 2 1 4 7 >> Loops, you say we need loops? We don't need no stinkin' loops. > > --David. > > > > > rating <- c("A", "AAA", "A", "BBB","AA","A","BB", "BBB", "AA", "AA", "AA", "A", "A", "AA","BB","BBB","AA", "A", "AAA","BBB","BBB", "BB", "A", "BB", "A", "AA", "B","A", "AA", "BBB", "A", "BBB") > > > > ead <- c(169229.93,100, 5877794.25, 9530148.63, 75040962.06, 21000, 1028360, 6000000, 17715000, 14430325.24, 1180946.57, 150000, 167490, 81255.16, 54812.5, 3000, 1275702.94, 9100, 1763142.3, 3283048.61, 1200000, 11800, 3000, 96894.02, 453671.72, 7590, 106065.24, 940711.67, 2443000, 9500000, 39000, 1501939.67) > > > > ## First I have sorted the data rating-wise as > > > > df <- data.frame(rating, ead) > > > > df_sorted <- > > df[order(df$rating),] > > > > df_sorted_AAA <-subset(df_sorted, rating=="AAA")> > df_sorted_AA <- subset(df_sorted, rating=="AA") > > df_sorted_A <- subset(df_sorted, rating=="A") > > df_sorted_BBB <- subset(df_sorted, rating=="BBB") > > df_sorted_BB <- subset(df_sorted, rating=="BB") > > df_sorted_B <- subset(df_sorted, rating=="B") > > df_sorted_CCC <- subset(df_sorted, rating=="CCC") > > > > ## we begin with BBB rating. The R output for df_sorted_BBB is as follows > > > >> df_sorted_BBB > > rating ead > > 4 BBB 9530149 > > 8 BBB 6000000 > > 16 BBB 3000 > > 20 BBB 3283049 > > 21 BBB1200000> > 30 BBB 9500000 > > 32 BBB 1501940 > > > > My problem is I need to totals of eads falling in the respective bands > > > > I > > am defining bands in millions as > > > > seq_BBB <- seq(1000000, max(df_sorted_BBB$ead), by = 1000000) > > > > # The output is > > [1] 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 7e+06 8e+06 9e+06 > > > > So for the sub data pertaining to Rating "BBB", I want corresponding ead totals i.e. I want ead totals where ead < 1e+06, then I want ead totals where 1+e06 < ead < 2e+06, 2e+06 < ead < 3e+06 ...and so on. > > > > I have tried the following code > > > > s_BBB <- NULL > > > > for (i in 1:length(s_BBB)) > > { > >s_BBB[i] = sum(subset(df_sorted_BBB$ead, df_sorted_BBB$ead < s_BBB[i]))> > } > > > > I was trying to find totals ofads < 1e+06, ead < 2e+06, ead<3e+06and so on. > > > > but the result is > > > >> s_BBB > > [1] 0 > > > > > > I apologize if I am not able to express my problem properly. My only objective is first to sort the whole portfolio rating-wise and then within each of these rating-wise sorted data, I wish to find out total of eads based > > on various bands starting <1000000, 1000000 - 200000, 2000000 - 3000000, 3000000 - 4000000 and so on. Since the database contains more than 50000 records, various ead amounts ranging from few 000's to billion are available. > > > > Please guide > > > > Thanking you all in advance>David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]]