thr3ads.net - R help - [R] How to create a new data.frame based on calculation of subsets of an existing data.frame [Dec 2019]

If this information is useful, please help other people find it:
Share via:

Jim Lemon

2019-Dec-17 22:40 UTC

[R] How to create a new data.frame based on calculation of subsets of an existing data.frame

Okay, I'm away for most of the day and might not be able to look at it
until tomorrow.

Jim

On Wed, Dec 18, 2019 at 9:27 AM Ioannou, Ioanna
<ioanna.ioannou at ucl.ac.uk> wrote:>
> Hello Jim ,
>
> I am very sorry.  Here is the corrected sample data to play with:
>
> Test.v2 <- data.frame(Ref.No = c(1622, 1623, 1624, 1625, 1626, 1627,
1628, 1629),
>                       Region = rep(c('South America'), times = 8),
>                       IM.type = c('PGA', 'PGA',
'PGA', 'PGA', 'Sa', 'Sa', 'Sa',
'Sa'),
>                       Damage.state = c('DS1', 'DS2',
'DS3', 'DS4','DS1', 'DS2', 'DS3',
'DS4'),
>                       Taxonomy =
c('ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H2','ER+ETR_H2','ER+ETR_H2','ER+ETR_H2'),
>                       IM_1 = c(0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00,
0.00),
>                       IM_2 = c(0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08,
0.08),
>                       IM_3 = c(0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16,
0.16),
>                       IM_4 = c(0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24,
0.24),
>                       Prob.of.exceedance_1 = c(0,0,0,0,0,0,0,0),
>                       Prob.of.exceedance_2 = c(0,0,0,0,0,0,0,0),
>                       Prob.of.exceedance_3 =
c(0.26,0.001,0.00019,0.000000573,0.04,0.00017,0.000215,0.000472),
>                       Prob.of.exceedance_4 =
c(0.72,0.03,0.008,0.000061,0.475,0.0007,0.00435,0.000405)
>                       )
>
> Basically I am using the total probability theorem to calculate a best
estimate. I am stuck how to do it for many cases. Many thanks for your patience.
>
> -----Original Message-----
> From: Jim Lemon [mailto:drjimlemon at gmail.com]
> Sent: Tuesday, December 17, 2019 10:22 PM
> To: Ioannou, Ioanna <ioanna.ioannou at ucl.ac.uk>
> Subject: Re: [R] How to create a new data.frame based on calculation of
subsets of an existing data.frame
>
> Hi Ioanna,
> After looking at your post for a while, I think that you are combining
columns IM_1 to IM_4 to generate VC_1 to VC_4. First, you seem to have omitted
the "Region" column from Test_v2, which means that your indices
(10:13) run out of range. It seems to me that you would find it easier to write
down what arithmetic operations you want and translate these into logical
expressions to extract the rows.
>
> Jim
>
> On Wed, Dec 18, 2019 at 7:47 AM Ioannou, Ioanna <ioanna.ioannou at
ucl.ac.uk> wrote:
> >
> > Hello everyone,
> >
> > I have the following problem: I have a data.frame with multiple
fields.
> >
> > If I had to do my calculations for a given combination of IM.type and
Taxonomy is the following:
> > D <- read.csv('Test_v2.csv')
> > names(D)
> >
> > VC <- 0.01*( subset(D, IM.type == 'PGA' & Damage.state
== 'DS1' & Taxonomy == 'ER+ETR_H1')[10:13] -
> >               subset(D, IM.type == 'PGA' & Damage.state ==
'DS2' & Taxonomy == 'ER+ETR_H1')[10:13])  +
> >   0.02*(     subset(D, IM.type == 'PGA' & Damage.state ==
'DS2' & Taxonomy == 'ER+ETR_H1')[10:13] -
> >               subset(D, IM.type == 'PGA' & Damage.state ==
'DS3' & Taxonomy == 'ER+ETR_H1')[10:13])  +
> >   0.43*( subset(D, IM.type == 'PGA' & Damage.state ==
'DS3' & Taxonomy == 'ER+ETR_H1')[10:13] -
> >            subset(D, IM.type == 'PGA' & Damage.state ==
'DS4' & Taxonomy == 'ER+ETR_H1')[10:13])  +
> >   1.0*( subset(D, IM.type == 'PGA' & Damage.state ==
'DS4' & Taxonomy
> > == 'ER+ETR_H1')[10:13])
> >
> > So the question is how can I do that in an automated way for all
possible combinations and store the results in new data.frame  which would look
like this:
> >
> > Ref.No. Region  IM.type Taxonomy        IM_1    IM_2    IM_3    IM_4  
VC_1    VC_2    VC_3    VC_4
> > 1622    South America   PGA     ER+ETR_H1       1.00E-06        0.08  
0.16    0.24      3.49e-294               3.449819e-05  0.002748889    
0.01122911
> >
> > Best, ,
> > ioanna
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
> >
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=02%7C01%7C%7C2808d89de
> > 79441309c4808d7833f7f81%7C1faf88fea9984c5b93c9210a11d9a5c2%7C0%7C0%7C6
> >
37122181061837860&amp;sdata=B%2FmCVpyLnCghj3KxgP7fYu3aOxy7uRjAVZ8fgdhc
> > u4w%3D&amp;reserved=0 PLEASE do read the posting guide
> > https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.R
> >
-project.org%2Fposting-guide.html&amp;data=02%7C01%7C%7C2808d89de79441
> > 309c4808d7833f7f81%7C1faf88fea9984c5b93c9210a11d9a5c2%7C0%7C0%7C637122
> >
181061837860&amp;sdata=e4YB5rlwfSLO%2B01i92q4%2F8otuyjv%2FoZnuIwfDWPGi
> > EE%3D&amp;reserved=0 and provide commented, minimal,
self-contained,
> > reproducible code.

Jim Lemon

2019-Dec-19 02:04 UTC

head link

[R] How to create a new data.frame based on calculation of subsets of an existing data.frame

Hi Ioanna,
I looked at the problem this morning and tried to work out what you
wanted. With a problem like this, it is often easy when you have
someone point to the data and say "I want this added to that and this
multiplied by that". I have probably made the wrong guesses, but I
hope that you can correct my guesses and I can get the calculations
correct for you. For example, I have assumed that you want the sum of
the IM_* values for each set of damage states as the values for VC_1,
VC_2 etc.

D<-data.frame(Ref.No = c(1622, 1623, 1624, 1625, 1626, 1627, 1628, 1629),
 Region = rep(c('South America'), times = 8),
 IM.type = c('PGA', 'PGA', 'PGA', 'PGA',
'Sa', 'Sa', 'Sa', 'Sa'),
 Damage.state = c('DS1', 'DS2', 'DS3',
'DS4','DS1', 'DS2', 'DS3', 'DS4'),
 Taxonomy =
c('ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H2',
 'ER+ETR_H2','ER+ETR_H2','ER+ETR_H2'),
 IM_1 = c(0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00),
 IM_2 = c(0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08),
 IM_3 = c(0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16),
 IM_4 = c(0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24),
 Prob.of.exceedance_1 = c(0,0,0,0,0,0,0,0),
 Prob.of.exceedance_2 = c(0,0,0,0,0,0,0,0),
 Prob.of.exceedance_3 
c(0.26,0.001,0.00019,0.000000573,0.04,0.00017,0.000215,0.000472),
 Prob.of.exceedance_4 
c(0.72,0.03,0.008,0.000061,0.475,0.0007,0.00435,0.000405),
 stringsAsFactors=FALSE)
# assume the above has been read in
# add the four columns to the data frame filled with NAs
D$VC_1<-D$VC_2<-D$VC_3<-D$VC_4<-NA
# names of the variables used in the calculations
calc_vars<-paste("Prob.of.exceedance",1:4,sep="_")
# get the rows for the four damage states
DS1_rows<-D$Damage.state == "DS1"
DS2_rows<-D$Damage.state == "DS2"
DS3_rows<-D$Damage.state == "DS3"
DS4_rows<-D$Damage.state == "DS4"
# step through all possible values of IM.type and Taxonomy
for(IM in unique(D$IM.type)) {
 for(Tax in unique(D$Taxonomy)) {
  # get a logical vector of the rows to be used in this calculation
  calc_rows<-D$IM.type == IM & D$Taxonomy == Tax
  cat(IM,Tax,calc_rows,"\n")
  # check that there are any such rows in the data frame
  if(sum(calc_rows)) {
   # if so, fill in the four values for these rows
   D$VC_1[calc_rows]<-sum(0.01 * (D[calc_rows & DS1_rows,calc_vars] -
    D[calc_rows & DS2_rows,calc_vars]))
   D$VC_2[calc_rows]<-sum(0.02 * (D[calc_rows & DS2_rows,calc_vars] -
    D[calc_rows & DS3_rows,calc_vars]))
   D$VC_3[calc_rows]<-sum(0.43 * (D[calc_rows & DS3_rows,calc_vars] -
    D[calc_rows & DS4_rows,calc_vars]))
   D$VC_4[calc_rows]<-sum(D[calc_rows & DS4_rows,calc_vars])
  }
 }
}

Jim

Ioannou, Ioanna

2019-Dec-20 10:01 UTC

head link

[R] How to create a new data.frame based on calculation of subsets of an existing data.frame

Hello Jim, 

Thank you every so  much it ws very helful. In fact what I want to calculate is
the following. My very last question is if I want to save the outcome VC,
IM.type and Taxonomy in a new data.frame how can I do it?

# names of the variables used in the calculations
calc_vars<-paste("Prob.of.exceedance",1:4,sep="_")
# get the rows for the four damage states 
DS1_rows <-D$Damage.state == "DS1"
DS2_rows <-D$Damage.state == "DS2"
DS3_rows <-D$Damage.state == "DS3"
DS4_rows <-D$Damage.state == "DS4"
# step through all possible values of IM.type and Taxonomy 
for(IM in unique(D$IM.type)) {  for(Tax in unique(D$Taxonomy)) {
# get a logical vector of the rows to be used in this calculation
calc_rows <- D$IM.type == IM & D$Taxonomy == Tax
cat(IM,Tax,calc_rows,"\n")
# check that there are any such rows in the data frame
if(sum(calc_rows)) {
  # if so, fill in the four values for these rows
  VC <- 0.0 * (1- D[calc_rows & DS1_rows,calc_vars]) +
    0.02* (D[calc_rows & DS1_rows,calc_vars] -
               D[calc_rows & DS2_rows,calc_vars]) +
    0.10* (D[calc_rows & DS2_rows,calc_vars] -
                                   D[calc_rows & DS3_rows,calc_vars]) +
    0.43 * (D[calc_rows & DS3_rows,calc_vars] -
                                   D[calc_rows & DS4_rows,calc_vars]) +
    1.0*   D[calc_rows & DS4_rows,calc_vars]

}
}
}

-----Original Message-----
From: Jim Lemon [mailto:drjimlemon at gmail.com] 
Sent: Thursday, December 19, 2019 2:05 AM
To: Ioannou, Ioanna <ioanna.ioannou at ucl.ac.uk>; r-help mailing list
<r-help at r-project.org>
Subject: Re: [R] How to create a new data.frame based on calculation of subsets
of an existing data.frame

Hi Ioanna,
I looked at the problem this morning and tried to work out what you wanted. With
a problem like this, it is often easy when you have someone point to the data
and say "I want this added to that and this multiplied by that". I
have probably made the wrong guesses, but I hope that you can correct my guesses
and I can get the calculations correct for you. For example, I have assumed that
you want the sum of the IM_* values for each set of damage states as the values
for VC_1,
VC_2 etc.

D<-data.frame(Ref.No = c(1622, 1623, 1624, 1625, 1626, 1627, 1628, 1629), 
Region = rep(c('South America'), times = 8),  IM.type = c('PGA',
'PGA', 'PGA', 'PGA', 'Sa', 'Sa',
'Sa', 'Sa'),  Damage.state = c('DS1', 'DS2',
'DS3', 'DS4','DS1', 'DS2', 'DS3',
'DS4'),  Taxonomy =
c('ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H1','ER+ETR_H2',
 'ER+ETR_H2','ER+ETR_H2','ER+ETR_H2'),
 IM_1 = c(0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00),
 IM_2 = c(0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08),
 IM_3 = c(0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16),
 IM_4 = c(0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24),
 Prob.of.exceedance_1 = c(0,0,0,0,0,0,0,0),
 Prob.of.exceedance_2 = c(0,0,0,0,0,0,0,0),
 Prob.of.exceedance_3 
c(0.26,0.001,0.00019,0.000000573,0.04,0.00017,0.000215,0.000472),
 Prob.of.exceedance_4 
c(0.72,0.03,0.008,0.000061,0.475,0.0007,0.00435,0.000405),
 stringsAsFactors=FALSE)
# assume the above has been read in
# add the four columns to the data frame filled with NAs
D$VC_1<-D$VC_2<-D$VC_3<-D$VC_4<-NA
# names of the variables used in the calculations
calc_vars<-paste("Prob.of.exceedance",1:4,sep="_")
# get the rows for the four damage states DS1_rows<-D$Damage.state ==
"DS1"
DS2_rows<-D$Damage.state == "DS2"
DS3_rows<-D$Damage.state == "DS3"
DS4_rows<-D$Damage.state == "DS4"
# step through all possible values of IM.type and Taxonomy for(IM in
unique(D$IM.type)) {  for(Tax in unique(D$Taxonomy)) {
  # get a logical vector of the rows to be used in this calculation
  calc_rows<-D$IM.type == IM & D$Taxonomy == Tax
  cat(IM,Tax,calc_rows,"\n")
  # check that there are any such rows in the data frame
  if(sum(calc_rows)) {
   # if so, fill in the four values for these rows
   D$VC_1[calc_rows]<-sum(0.01 * (D[calc_rows & DS1_rows,calc_vars] -
    D[calc_rows & DS2_rows,calc_vars]))
   D$VC_2[calc_rows]<-sum(0.02 * (D[calc_rows & DS2_rows,calc_vars] -
    D[calc_rows & DS3_rows,calc_vars]))
   D$VC_3[calc_rows]<-sum(0.43 * (D[calc_rows & DS3_rows,calc_vars] -
    D[calc_rows & DS4_rows,calc_vars]))
   D$VC_4[calc_rows]<-sum(D[calc_rows & DS4_rows,calc_vars])
  }
 }
}

Jim

R help - Dec 2019 - How to create a new data.frame based on calculation of subsets of an existing data.frame

[R] How to create a new data.frame based on calculation of subsets of an existing data.frame

[R] How to create a new data.frame based on calculation of subsets of an existing data.frame

[R] How to create a new data.frame based on calculation of subsets of an existing data.frame