Jim Lemon
2019-Dec-17 22:40 UTC
[R] How to create a new data.frame based on calculation of subsets of an existing data.frame
Okay, I'm away for most of the day and might not be able to look at it until tomorrow. Jim On Wed, Dec 18, 2019 at 9:27 AM Ioannou, Ioanna <ioanna.ioannou at ucl.ac.uk> wrote:> > Hello Jim , > > I am very sorry. Here is the corrected sample data to play with: > > Test.v2 <- data.frame(Ref.No = c(1622, 1623, 1624, 1625, 1626, 1627, 1628, 1629), > Region = rep(c('South America'), times = 8), > IM.type = c('PGA', 'PGA', 'PGA', 'PGA', 'Sa', 'Sa', 'Sa', 'Sa'), > Damage.state = c('DS1', 'DS2', 'DS3', 'DS4','DS1', 'DS2', 'DS3', 'DS4'), > Taxonomy = c('ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H2','ER+ETR_H2','ER+ETR_H2','ER+ETR_H2'), > IM_1 = c(0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00), > IM_2 = c(0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08), > IM_3 = c(0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16), > IM_4 = c(0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24), > Prob.of.exceedance_1 = c(0,0,0,0,0,0,0,0), > Prob.of.exceedance_2 = c(0,0,0,0,0,0,0,0), > Prob.of.exceedance_3 = c(0.26,0.001,0.00019,0.000000573,0.04,0.00017,0.000215,0.000472), > Prob.of.exceedance_4 = c(0.72,0.03,0.008,0.000061,0.475,0.0007,0.00435,0.000405) > ) > > Basically I am using the total probability theorem to calculate a best estimate. I am stuck how to do it for many cases. Many thanks for your patience. > > -----Original Message----- > From: Jim Lemon [mailto:drjimlemon at gmail.com] > Sent: Tuesday, December 17, 2019 10:22 PM > To: Ioannou, Ioanna <ioanna.ioannou at ucl.ac.uk> > Subject: Re: [R] How to create a new data.frame based on calculation of subsets of an existing data.frame > > Hi Ioanna, > After looking at your post for a while, I think that you are combining columns IM_1 to IM_4 to generate VC_1 to VC_4. First, you seem to have omitted the "Region" column from Test_v2, which means that your indices (10:13) run out of range. It seems to me that you would find it easier to write down what arithmetic operations you want and translate these into logical expressions to extract the rows. > > Jim > > On Wed, Dec 18, 2019 at 7:47 AM Ioannou, Ioanna <ioanna.ioannou at ucl.ac.uk> wrote: > > > > Hello everyone, > > > > I have the following problem: I have a data.frame with multiple fields. > > > > If I had to do my calculations for a given combination of IM.type and Taxonomy is the following: > > D <- read.csv('Test_v2.csv') > > names(D) > > > > VC <- 0.01*( subset(D, IM.type == 'PGA' & Damage.state == 'DS1' & Taxonomy == 'ER+ETR_H1')[10:13] - > > subset(D, IM.type == 'PGA' & Damage.state == 'DS2' & Taxonomy == 'ER+ETR_H1')[10:13]) + > > 0.02*( subset(D, IM.type == 'PGA' & Damage.state == 'DS2' & Taxonomy == 'ER+ETR_H1')[10:13] - > > subset(D, IM.type == 'PGA' & Damage.state == 'DS3' & Taxonomy == 'ER+ETR_H1')[10:13]) + > > 0.43*( subset(D, IM.type == 'PGA' & Damage.state == 'DS3' & Taxonomy == 'ER+ETR_H1')[10:13] - > > subset(D, IM.type == 'PGA' & Damage.state == 'DS4' & Taxonomy == 'ER+ETR_H1')[10:13]) + > > 1.0*( subset(D, IM.type == 'PGA' & Damage.state == 'DS4' & Taxonomy > > == 'ER+ETR_H1')[10:13]) > > > > So the question is how can I do that in an automated way for all possible combinations and store the results in new data.frame which would look like this: > > > > Ref.No. Region IM.type Taxonomy IM_1 IM_2 IM_3 IM_4 VC_1 VC_2 VC_3 VC_4 > > 1622 South America PGA ER+ETR_H1 1.00E-06 0.08 0.16 0.24 3.49e-294 3.449819e-05 0.002748889 0.01122911 > > > > Best, , > > ioanna > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat > > .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=02%7C01%7C%7C2808d89de > > 79441309c4808d7833f7f81%7C1faf88fea9984c5b93c9210a11d9a5c2%7C0%7C0%7C6 > > 37122181061837860&sdata=B%2FmCVpyLnCghj3KxgP7fYu3aOxy7uRjAVZ8fgdhc > > u4w%3D&reserved=0 PLEASE do read the posting guide > > https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.R > > -project.org%2Fposting-guide.html&data=02%7C01%7C%7C2808d89de79441 > > 309c4808d7833f7f81%7C1faf88fea9984c5b93c9210a11d9a5c2%7C0%7C0%7C637122 > > 181061837860&sdata=e4YB5rlwfSLO%2B01i92q4%2F8otuyjv%2FoZnuIwfDWPGi > > EE%3D&reserved=0 and provide commented, minimal, self-contained, > > reproducible code.
Jim Lemon
2019-Dec-19 02:04 UTC
[R] How to create a new data.frame based on calculation of subsets of an existing data.frame
Hi Ioanna, I looked at the problem this morning and tried to work out what you wanted. With a problem like this, it is often easy when you have someone point to the data and say "I want this added to that and this multiplied by that". I have probably made the wrong guesses, but I hope that you can correct my guesses and I can get the calculations correct for you. For example, I have assumed that you want the sum of the IM_* values for each set of damage states as the values for VC_1, VC_2 etc. D<-data.frame(Ref.No = c(1622, 1623, 1624, 1625, 1626, 1627, 1628, 1629), Region = rep(c('South America'), times = 8), IM.type = c('PGA', 'PGA', 'PGA', 'PGA', 'Sa', 'Sa', 'Sa', 'Sa'), Damage.state = c('DS1', 'DS2', 'DS3', 'DS4','DS1', 'DS2', 'DS3', 'DS4'), Taxonomy = c('ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H2', 'ER+ETR_H2','ER+ETR_H2','ER+ETR_H2'), IM_1 = c(0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00), IM_2 = c(0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08), IM_3 = c(0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16), IM_4 = c(0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24), Prob.of.exceedance_1 = c(0,0,0,0,0,0,0,0), Prob.of.exceedance_2 = c(0,0,0,0,0,0,0,0), Prob.of.exceedance_3 c(0.26,0.001,0.00019,0.000000573,0.04,0.00017,0.000215,0.000472), Prob.of.exceedance_4 c(0.72,0.03,0.008,0.000061,0.475,0.0007,0.00435,0.000405), stringsAsFactors=FALSE) # assume the above has been read in # add the four columns to the data frame filled with NAs D$VC_1<-D$VC_2<-D$VC_3<-D$VC_4<-NA # names of the variables used in the calculations calc_vars<-paste("Prob.of.exceedance",1:4,sep="_") # get the rows for the four damage states DS1_rows<-D$Damage.state == "DS1" DS2_rows<-D$Damage.state == "DS2" DS3_rows<-D$Damage.state == "DS3" DS4_rows<-D$Damage.state == "DS4" # step through all possible values of IM.type and Taxonomy for(IM in unique(D$IM.type)) { for(Tax in unique(D$Taxonomy)) { # get a logical vector of the rows to be used in this calculation calc_rows<-D$IM.type == IM & D$Taxonomy == Tax cat(IM,Tax,calc_rows,"\n") # check that there are any such rows in the data frame if(sum(calc_rows)) { # if so, fill in the four values for these rows D$VC_1[calc_rows]<-sum(0.01 * (D[calc_rows & DS1_rows,calc_vars] - D[calc_rows & DS2_rows,calc_vars])) D$VC_2[calc_rows]<-sum(0.02 * (D[calc_rows & DS2_rows,calc_vars] - D[calc_rows & DS3_rows,calc_vars])) D$VC_3[calc_rows]<-sum(0.43 * (D[calc_rows & DS3_rows,calc_vars] - D[calc_rows & DS4_rows,calc_vars])) D$VC_4[calc_rows]<-sum(D[calc_rows & DS4_rows,calc_vars]) } } } Jim
Ioannou, Ioanna
2019-Dec-20 10:01 UTC
[R] How to create a new data.frame based on calculation of subsets of an existing data.frame
Hello Jim, Thank you every so much it ws very helful. In fact what I want to calculate is the following. My very last question is if I want to save the outcome VC, IM.type and Taxonomy in a new data.frame how can I do it? # names of the variables used in the calculations calc_vars<-paste("Prob.of.exceedance",1:4,sep="_") # get the rows for the four damage states DS1_rows <-D$Damage.state == "DS1" DS2_rows <-D$Damage.state == "DS2" DS3_rows <-D$Damage.state == "DS3" DS4_rows <-D$Damage.state == "DS4" # step through all possible values of IM.type and Taxonomy for(IM in unique(D$IM.type)) { for(Tax in unique(D$Taxonomy)) { # get a logical vector of the rows to be used in this calculation calc_rows <- D$IM.type == IM & D$Taxonomy == Tax cat(IM,Tax,calc_rows,"\n") # check that there are any such rows in the data frame if(sum(calc_rows)) { # if so, fill in the four values for these rows VC <- 0.0 * (1- D[calc_rows & DS1_rows,calc_vars]) + 0.02* (D[calc_rows & DS1_rows,calc_vars] - D[calc_rows & DS2_rows,calc_vars]) + 0.10* (D[calc_rows & DS2_rows,calc_vars] - D[calc_rows & DS3_rows,calc_vars]) + 0.43 * (D[calc_rows & DS3_rows,calc_vars] - D[calc_rows & DS4_rows,calc_vars]) + 1.0* D[calc_rows & DS4_rows,calc_vars] } } } -----Original Message----- From: Jim Lemon [mailto:drjimlemon at gmail.com] Sent: Thursday, December 19, 2019 2:05 AM To: Ioannou, Ioanna <ioanna.ioannou at ucl.ac.uk>; r-help mailing list <r-help at r-project.org> Subject: Re: [R] How to create a new data.frame based on calculation of subsets of an existing data.frame Hi Ioanna, I looked at the problem this morning and tried to work out what you wanted. With a problem like this, it is often easy when you have someone point to the data and say "I want this added to that and this multiplied by that". I have probably made the wrong guesses, but I hope that you can correct my guesses and I can get the calculations correct for you. For example, I have assumed that you want the sum of the IM_* values for each set of damage states as the values for VC_1, VC_2 etc. D<-data.frame(Ref.No = c(1622, 1623, 1624, 1625, 1626, 1627, 1628, 1629), Region = rep(c('South America'), times = 8), IM.type = c('PGA', 'PGA', 'PGA', 'PGA', 'Sa', 'Sa', 'Sa', 'Sa'), Damage.state = c('DS1', 'DS2', 'DS3', 'DS4','DS1', 'DS2', 'DS3', 'DS4'), Taxonomy = c('ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H2', 'ER+ETR_H2','ER+ETR_H2','ER+ETR_H2'), IM_1 = c(0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00), IM_2 = c(0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08), IM_3 = c(0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16), IM_4 = c(0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24), Prob.of.exceedance_1 = c(0,0,0,0,0,0,0,0), Prob.of.exceedance_2 = c(0,0,0,0,0,0,0,0), Prob.of.exceedance_3 c(0.26,0.001,0.00019,0.000000573,0.04,0.00017,0.000215,0.000472), Prob.of.exceedance_4 c(0.72,0.03,0.008,0.000061,0.475,0.0007,0.00435,0.000405), stringsAsFactors=FALSE) # assume the above has been read in # add the four columns to the data frame filled with NAs D$VC_1<-D$VC_2<-D$VC_3<-D$VC_4<-NA # names of the variables used in the calculations calc_vars<-paste("Prob.of.exceedance",1:4,sep="_") # get the rows for the four damage states DS1_rows<-D$Damage.state == "DS1" DS2_rows<-D$Damage.state == "DS2" DS3_rows<-D$Damage.state == "DS3" DS4_rows<-D$Damage.state == "DS4" # step through all possible values of IM.type and Taxonomy for(IM in unique(D$IM.type)) { for(Tax in unique(D$Taxonomy)) { # get a logical vector of the rows to be used in this calculation calc_rows<-D$IM.type == IM & D$Taxonomy == Tax cat(IM,Tax,calc_rows,"\n") # check that there are any such rows in the data frame if(sum(calc_rows)) { # if so, fill in the four values for these rows D$VC_1[calc_rows]<-sum(0.01 * (D[calc_rows & DS1_rows,calc_vars] - D[calc_rows & DS2_rows,calc_vars])) D$VC_2[calc_rows]<-sum(0.02 * (D[calc_rows & DS2_rows,calc_vars] - D[calc_rows & DS3_rows,calc_vars])) D$VC_3[calc_rows]<-sum(0.43 * (D[calc_rows & DS3_rows,calc_vars] - D[calc_rows & DS4_rows,calc_vars])) D$VC_4[calc_rows]<-sum(D[calc_rows & DS4_rows,calc_vars]) } } } Jim