thr3ads.net - R help - [R] Data transformation problem [Nov 2020]

If this information is useful, please help other people find it:
Share via:

phii m@iii@g oii phiiipsmith@c@

2020-Nov-12 03:25 UTC

[R] Data transformation problem

I am stuck on a data transformation problem. I have a data frame, df1 in 
my example, with some original "levels" data. The data pertain to some
variable, such as GDP, in various reference periods, REF, as estimated 
and released in various release periods, REL. The release periods follow 
after the reference periods by two months or more, sometimes by several 
years. I want to build a second data frame, called df2 in my example, 
with the month-to-month growth rates that existed in each reference 
period, revealing the revisions to those growth rates in subsequent 
periods.

REF1 <- 
c("2017-01-01","2017-01-01","2017-01-01","2017-01-01","2017-01-01",
  
"2017-02-01","2017-02-01","2017-02-01","2017-02-01","2017-02-01",
  
"2017-03-01","2017-03-01","2017-03-01","2017-03-01","2017-03-01")
REL1 <- 
c("2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
  
"2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
  
"2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01")
VAL1 <- 
c(17974,14567,13425,NA,12900,17974,14000,14000,12999,13245,17197,11500,
   19900,18765,13467)
df1 <- data.frame(REF1,REL1,VAL1)
REF2 <- 
c("2017-02-01","2017-02-01","2017-02-01","2017-02-01","2017-02-01",
  
"2017-03-01","2017-03-01","2017-03-01","2017-03-01","2017-03-01")
REL2 <- 
c("2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
  
"2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01")
VAL2 <- c(0.0,-3.9,4.3,NA,2.3,-4.3,-17.9,42.1,44.4,1.7)
df2 <- data.frame(REF2,REL2,VAL2)

In my example I have provided some sample data pertaining to three 
reference months, 2017-01-01 through 2017-03-01, and five release 
periods,
"2020-09-01","2020-08-01","2020-07-01","2020-06-01"
and
"2019-05-01". In my actual problem I have millions of REF-REL 
combinations, so my data frame is quite large. I am using data.table for 
faster processing, though I am more familiar with the tidyverse. I am 
providing df2 as the target data frame for my example, so you can see 
what I am trying to achieve.

I have not been able to find an efficient way to do these calculations. 
I have tried "for" loops with "if" statements, without
success so far,
and anyway this approach would be too slow, I fear. Suggestions as to 
how I might proceed would be much appreciated.

Philip

Jeff Newmiller

2020-Nov-12 07:20 UTC

head link

[R] Data transformation problem

I am not a data.table afficiando, but here is how I would do it with 
dplyr/tidyr:

library(dplyr)
library(tidyr)

do_per_REL <- function( DF ) {
   rng <- range( DF$REF1 ) # watch out for missing months?
   DF <- (   data.frame( REF1 = seq( rng[ 1 ], rng[ 2 ], by =
"month" ) )
         %>% left_join( DF, by = "REF1" )
         %>% arrange( REF1 )
         )
   with( DF
       , data.frame( REF2 = REF1[ -1 ]
                   , VAL2 = 100 * diff( VAL1 ) / VAL1[ -length( VAL1 ) ]
                   )
       )
}

df2a <- (   df1
         %>% mutate( REF1 = as.Date( REF1 )
                   , REL1 = as.Date( REL1 )
                   )
         %>% nest( data = -REL1 )
         %>% rename( REL2 = REL1 )
         %>% rowwise()
         %>% mutate( data = list( do_per_REL( data ) ) )
         %>% ungroup()
         %>% unnest( cols = "data" )
         %>% select( REF2, REL2, VAL2 )
         %>% arrange( REF2, desc( REL2 ), VAL2 )
         )
df2a

On Wed, 11 Nov 2020, phil at philipsmith.ca wrote:
> I am stuck on a data transformation problem. I have a data frame, df1 in my
> example, with some original "levels" data. The data pertain to
some variable,
> such as GDP, in various reference periods, REF, as estimated and released
in
> various release periods, REL. The release periods follow after the
reference
> periods by two months or more, sometimes by several years. I want to build
a
> second data frame, called df2 in my example, with the month-to-month growth
> rates that existed in each reference period, revealing the revisions to
those
> growth rates in subsequent periods.
>
> REF1 <-
c("2017-01-01","2017-01-01","2017-01-01","2017-01-01","2017-01-01",
> 
"2017-02-01","2017-02-01","2017-02-01","2017-02-01","2017-02-01",
> 
"2017-03-01","2017-03-01","2017-03-01","2017-03-01","2017-03-01")
> REL1 <-
c("2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
> 
"2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
> 
"2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01")
> VAL1 <- 
> c(17974,14567,13425,NA,12900,17974,14000,14000,12999,13245,17197,11500,
>  19900,18765,13467)
> df1 <- data.frame(REF1,REL1,VAL1)
> REF2 <-
c("2017-02-01","2017-02-01","2017-02-01","2017-02-01","2017-02-01",
> 
"2017-03-01","2017-03-01","2017-03-01","2017-03-01","2017-03-01")
> REL2 <-
c("2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
> 
"2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01")
> VAL2 <- c(0.0,-3.9,4.3,NA,2.3,-4.3,-17.9,42.1,44.4,1.7)
> df2 <- data.frame(REF2,REL2,VAL2)
>
> In my example I have provided some sample data pertaining to three
reference
> months, 2017-01-01 through 2017-03-01, and five release periods, 
>
"2020-09-01","2020-08-01","2020-07-01","2020-06-01"
and "2019-05-01". In my
> actual problem I have millions of REF-REL combinations, so my data frame is
> quite large. I am using data.table for faster processing, though I am more 
> familiar with the tidyverse. I am providing df2 as the target data frame
for
> my example, so you can see what I am trying to achieve.
>
> I have not been able to find an efficient way to do these calculations. I 
> have tried "for" loops with "if" statements, without
success so far, and
> anyway this approach would be too slow, I fear. Suggestions as to how I
might
> proceed would be much appreciated.
>
> Philip
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

phii m@iii@g oii phiiipsmith@c@

2020-Nov-12 12:23 UTC

head link

[R] Data transformation problem

Thank you so much for this elegant solution, Jeff.

Philip

On 2020-11-12 02:20, Jeff Newmiller wrote:> I am not a data.table afficiando, but here is how I would do it with
> dplyr/tidyr:
> 
> library(dplyr)
> library(tidyr)
> 
> do_per_REL <- function( DF ) {
>   rng <- range( DF$REF1 ) # watch out for missing months?
>   DF <- (   data.frame( REF1 = seq( rng[ 1 ], rng[ 2 ], by =
"month" )
> )
>         %>% left_join( DF, by = "REF1" )
>         %>% arrange( REF1 )
>         )
>   with( DF
>       , data.frame( REF2 = REF1[ -1 ]
>                   , VAL2 = 100 * diff( VAL1 ) / VAL1[ -length( VAL1 ) ]
>                   )
>       )
> }
> 
> df2a <- (   df1
>         %>% mutate( REF1 = as.Date( REF1 )
>                   , REL1 = as.Date( REL1 )
>                   )
>         %>% nest( data = -REL1 )
>         %>% rename( REL2 = REL1 )
>         %>% rowwise()
>         %>% mutate( data = list( do_per_REL( data ) ) )
>         %>% ungroup()
>         %>% unnest( cols = "data" )
>         %>% select( REF2, REL2, VAL2 )
>         %>% arrange( REF2, desc( REL2 ), VAL2 )
>         )
> df2a
> 
> On Wed, 11 Nov 2020, phil at philipsmith.ca wrote:
> 
>> I am stuck on a data transformation problem. I have a data frame, df1 
>> in my example, with some original "levels" data. The data
pertain to
>> some variable, such as GDP, in various reference periods, REF, as 
>> estimated and released in various release periods, REL. The release 
>> periods follow after the reference periods by two months or more, 
>> sometimes by several years. I want to build a second data frame, 
>> called df2 in my example, with the month-to-month growth rates that 
>> existed in each reference period, revealing the revisions to those 
>> growth rates in subsequent periods.
>> 
>> REF1 <- 
>>
c("2017-01-01","2017-01-01","2017-01-01","2017-01-01","2017-01-01",
>> 
"2017-02-01","2017-02-01","2017-02-01","2017-02-01","2017-02-01",
>> 
"2017-03-01","2017-03-01","2017-03-01","2017-03-01","2017-03-01")
>> REL1 <- 
>>
c("2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
>> 
"2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
>> 
"2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01")
>> VAL1 <- 
>> c(17974,14567,13425,NA,12900,17974,14000,14000,12999,13245,17197,11500,
>>  19900,18765,13467)
>> df1 <- data.frame(REF1,REL1,VAL1)
>> REF2 <- 
>>
c("2017-02-01","2017-02-01","2017-02-01","2017-02-01","2017-02-01",
>> 
"2017-03-01","2017-03-01","2017-03-01","2017-03-01","2017-03-01")
>> REL2 <- 
>>
c("2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
>> 
"2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01")
>> VAL2 <- c(0.0,-3.9,4.3,NA,2.3,-4.3,-17.9,42.1,44.4,1.7)
>> df2 <- data.frame(REF2,REL2,VAL2)
>> 
>> In my example I have provided some sample data pertaining to three 
>> reference months, 2017-01-01 through 2017-03-01, and five release 
>> periods,
"2020-09-01","2020-08-01","2020-07-01","2020-06-01"
and
>> "2019-05-01". In my actual problem I have millions of REF-REL
>> combinations, so my data frame is quite large. I am using data.table 
>> for faster processing, though I am more familiar with the tidyverse. I 
>> am providing df2 as the target data frame for my example, so you can 
>> see what I am trying to achieve.
>> 
>> I have not been able to find an efficient way to do these 
>> calculations. I have tried "for" loops with "if"
statements, without
>> success so far, and anyway this approach would be too slow, I fear. 
>> Suggestions as to how I might proceed would be much appreciated.
>> 
>> Philip
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go 
> Live...
> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#. 
Live
> Go...
>                                       Live:   OO#.. Dead: OO#..  
> Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  
> rocks...1k
> ---------------------------------------------------------------------------

R help - Nov 2020 - Data transformation problem

[R] Data transformation problem

[R] Data transformation problem

[R] Data transformation problem