Dear all, I have a panel data with a large number of groups and the cumulative number of months (1 - 372) for January 1995 to December 2005. My goal is to extract the corresponding month and year for each observation. I tried the following; ### x %>% group_by(id) %>% do( data.frame(., Date= seq(.$startdate, as.Date('1975-01-01'), by = '1 month'))) However, I get the following error; ### Error in seq.default(.$startdate, as.Date("1975-01-01"), by = "1 month") : 'from' must be of length 1 Essentially, I want to convert the month number (1-372) to January 1975 to December 2005 by ID. Is that possible? Any help will be highly appreciated. ### Data ### x <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), month c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), startdate = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "1975-01-01", class = "factor"), enddate = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label "2005-12-31", class = "factor")), class = "data.frame", row.names = c(NA, -9L)) Cross-posted in Statalist - however, Stata seems to have a very inflexible date-time structure. Best regards, Milu [[alternative HTML version deleted]]
dtV <- seq(from=as.Date("1975-01-01"),by='1 month',length=372) x$dt <- dtV[x$id] HTH, Eric On Mon, Sep 2, 2019 at 8:36 PM Miluji Sb <milujisb at gmail.com> wrote:> Dear all, > > I have a panel data with a large number of groups and the cumulative number > of months (1 - 372) for January 1995 to December 2005. My goal is to > extract the corresponding month and year for each observation. > > I tried the following; > > ### > x %>% > group_by(id) %>% > do( data.frame(., Date= seq(.$startdate, > as.Date('1975-01-01'), by = '1 month'))) > > However, I get the following error; > > ### > Error in seq.default(.$startdate, as.Date("1975-01-01"), by = "1 month") : > 'from' must be of length 1 > > Essentially, I want to convert the month number (1-372) to January 1975 to > December 2005 by ID. Is that possible? Any help will be highly appreciated. > > ### Data ### > x <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), month > c(1L, > 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), startdate = structure(c(1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "1975-01-01", class = "factor"), > enddate = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label > "2005-12-31", class = "factor")), class = "data.frame", row.names = c(NA, > -9L)) > > Cross-posted in Statalist - however, Stata seems to have a very inflexible > date-time structure. > > Best regards, > > Milu > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Your sample data has startdate and enddate columns as though these values might vary throughout the data set, but your description suggests that those values are fixed for the whole dataset. If those values are really constant, then x$Year <- 1995L + 12 * ( ( x$month - 1L ) %/% 12L ) x$Month <- ( x$month - 1L ) %% 12L + 1 should be enough. If in fact your startdate can be different for different IDs, then you would need to wrap this up a bit: ############ base R x$startdate <- as.Date( as.character( x$startdate ) ) x$Y <- as.integer( format( DF$startdate, format = "%Y" ) ) xlist <- split( x, x$id ) xlist1 <- lapply( xlist , FUN = function( DF ) { DF$Year <- Y + 12 * ( ( DF$month - 1L ) %/% 12L ) DF$Month <- ( DF$month - 1L ) %% 12L + 1 DF } ) x1 <- unsplit( xlist1, x$id ) ###### dplyr library(dplyr) x2 <- ( x %>% mutate( Y = as.integer( format( startdate, format = "%Y" ) ) ) %>% group_by( id ) %>% mutate( Year = Y + 12L * ( ( month - 1L ) %/% 12L ) , Month = ( month - 1L ) %% 12L + 1L ) %>% ungroup ) ######### On Mon, 2 Sep 2019, Miluji Sb wrote:> Dear all, > > I have a panel data with a large number of groups and the cumulative number > of months (1 - 372) for January 1995 to December 2005. My goal is to > extract the corresponding month and year for each observation. > > I tried the following; > > ### > x %>% > group_by(id) %>% > do( data.frame(., Date= seq(.$startdate, > as.Date('1975-01-01'), by = '1 month'))) > > However, I get the following error; > > ### > Error in seq.default(.$startdate, as.Date("1975-01-01"), by = "1 month") : > 'from' must be of length 1 > > Essentially, I want to convert the month number (1-372) to January 1975 to > December 2005 by ID. Is that possible? Any help will be highly appreciated. > > ### Data ### > x <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), month > c(1L, > 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), startdate = structure(c(1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "1975-01-01", class = "factor"), > enddate = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label > "2005-12-31", class = "factor")), class = "data.frame", row.names = c(NA, > -9L)) > > Cross-posted in Statalist - however, Stata seems to have a very inflexible > date-time structure. > > Best regards, > > Milu > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >--------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
Thanks, Jeff. I am trying your solutions - appreciate your help! Best, Milu On Tue, Sep 3, 2019 at 12:51 AM Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> Your sample data has startdate and enddate columns as though these values > might vary throughout the data set, but your description suggests that > those values are fixed for the whole dataset. > > If those values are really constant, then > > x$Year <- 1995L + 12 * ( ( x$month - 1L ) %/% 12L ) > x$Month <- ( x$month - 1L ) %% 12L + 1 > > should be enough. > > If in fact your startdate can be different for different IDs, then you > would need to wrap this up a bit: > > ############ base R > x$startdate <- as.Date( as.character( x$startdate ) ) > x$Y <- as.integer( format( DF$startdate, format = "%Y" ) ) > xlist <- split( x, x$id ) > xlist1 <- lapply( xlist > , FUN = function( DF ) { > DF$Year <- Y + 12 * ( ( DF$month - 1L ) %/% 12L ) > DF$Month <- ( DF$month - 1L ) %% 12L + 1 > DF > } > ) > x1 <- unsplit( xlist1, x$id ) > > ###### dplyr > library(dplyr) > x2 <- ( x > %>% mutate( Y = as.integer( format( startdate, format = "%Y" ) ) ) > %>% group_by( id ) > %>% mutate( Year = Y + 12L * ( ( month - 1L ) %/% 12L ) > , Month = ( month - 1L ) %% 12L + 1L > ) > %>% ungroup > ) > ######### > > On Mon, 2 Sep 2019, Miluji Sb wrote: > > > Dear all, > > > > I have a panel data with a large number of groups and the cumulative > number > > of months (1 - 372) for January 1995 to December 2005. My goal is to > > extract the corresponding month and year for each observation. > > > > I tried the following; > > > > ### > > x %>% > > group_by(id) %>% > > do( data.frame(., Date= seq(.$startdate, > > as.Date('1975-01-01'), by = '1 month'))) > > > > However, I get the following error; > > > > ### > > Error in seq.default(.$startdate, as.Date("1975-01-01"), by = "1 month") > : > > 'from' must be of length 1 > > > > Essentially, I want to convert the month number (1-372) to January 1975 > to > > December 2005 by ID. Is that possible? Any help will be highly > appreciated. > > > > ### Data ### > > x <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), month > > c(1L, > > 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), startdate = structure(c(1L, > > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "1975-01-01", class > "factor"), > > enddate = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label > > "2005-12-31", class = "factor")), class = "data.frame", row.names = c(NA, > > -9L)) > > > > Cross-posted in Statalist - however, Stata seems to have a very > inflexible > > date-time structure. > > > > Best regards, > > > > Milu > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live > Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --------------------------------------------------------------------------- >[[alternative HTML version deleted]]