Dear R friends, I am currently working with time series data, and I have a table(as data frame) that has looks like this (TransitDate are in format = "%e-%B-%Y") : TransitDate Transits CargoTons 1985-04-01 100 2500 1985-05-01 135 4500 1985-06-01 120 1750 1985-07-01 100 3750 1985-08-01 200 1250 The problem is, that there are several periods that don?t exist in the table, so it has the following behavior: TransitDate Transits CargoTons 1985-04-01 100 1000 1985-07-01 100 1080 1985-12-01 500 3785 1986-04-01 325 4200 . . 2017-09-01 400 2350 (*this is the last observation) You can see in the last table fragment that the series jumps from 1985-04-01 to 1985-07-01, then it jumps from there to 1985-12-01 making the time series quite irregular (non-constant chronologically speaking). What I want to do is create a dummy table that has the sequence from the first observation (1985-04-01) up to the last one (2017-09-01) and then develop a code that checks if the dates contained in the dummy table exist in the original table, if they don?t exist then add those dates and put zeroes on the fields. How can I achieve this? Any help will be greatly appreciated, Best regards, Paul [[alternative HTML version deleted]]
Hi Paul, #First I set up some sample data since I don't have a copy of your data dtOrig <- as.Date( c("1985-04-01","1985-07-01","1985-12-01","1986-04-01")) dfOrig <- data.frame( TransitDate=dtOrig, Transits=c(100,100,500,325), CargoTons=c(1000,1080,3785,4200) ) #Generate the complete set of dates as a data frame dfDates<- data.frame( TransitDate=seq(from=as.Date("1985-04-01"),by="1 month",length=13) ) # do the merge adding the "missing" rows (where NA will appear) dfNew <- merge(dfDates, dfOrig, by="TransitDate", all.x=TRUE ) # replace the NA's by zero dfNew[is.na(dfNew)] <- 0 HTH, Eric On Wed, Nov 1, 2017 at 9:45 PM, Paul Bernal <paulbernal07 at gmail.com> wrote:> Dear R friends, > > I am currently working with time series data, and I have a table(as data > frame) that has looks like this (TransitDate are in format = "%e-%B-%Y") : > > TransitDate Transits CargoTons > 1985-04-01 100 2500 > 1985-05-01 135 4500 > 1985-06-01 120 1750 > 1985-07-01 100 3750 > 1985-08-01 200 1250 > > The problem is, that there are several periods that don?t exist in the > table, so it has the following behavior: > > TransitDate Transits CargoTons > 1985-04-01 100 1000 > 1985-07-01 100 1080 > 1985-12-01 500 3785 > 1986-04-01 325 4200 > . > . > 2017-09-01 400 2350 (*this is the last observation) > > You can see in the last table fragment that the series jumps from > 1985-04-01 to 1985-07-01, then it jumps from there to 1985-12-01 making the > time series quite irregular (non-constant chronologically speaking). > > What I want to do is create a dummy table that has the sequence from the > first observation (1985-04-01) up to the last one (2017-09-01) and then > develop a code that checks if the dates contained in the dummy table exist > in the original table, if they don?t exist then add those dates and put > zeroes on the fields. > > How can I achieve this? > > Any help will be greatly appreciated, > > Best regards, > > Paul > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
Dear Eric, thank you for your kind reply, Assume dataset1Frame is the table containing the missing dates, and TransitDateFrame <- seq(as.Date(dataset1Frame[1,1]), as.Date(dataset1Frame[nrow(dataset1Frame),1]), "months") #dataset1Frame is basically reading some fields from a SQL Server table, the first one being the date #here dataset1Frame[1,1] is the first date that appears in the table #dataset1Frame[nrow(dataset1Frame),1] is the last date available in the table if(nrow(dataset1Frame)!=nrow(TransitDateFrame)){ for(i in 1:nrow(dataset1Frame)){ if(!TransitDateFrame[i,1] %in% dataset1Frame){ dataset1Frame <- rbind(dataset1Frame, TransitDateFrame[i,])}else{ dataset1Frame} } } I used this code but didn?t work, maybe I am doing something wrong here? Best regards, Paul 2017-11-01 15:21 GMT-05:00 Eric Berger <ericjberger at gmail.com>:> Hi Paul, > > #First I set up some sample data since I don't have a copy of your data > dtOrig <- as.Date( c("1985-04-01","1985-07-01","1985-12-01","1986-04-01")) > dfOrig <- data.frame( TransitDate=dtOrig, Transits=c(100,100,500,325), > CargoTons=c(1000,1080,3785,4200) ) > > #Generate the complete set of dates as a data frame > dfDates<- data.frame( TransitDate=seq(from=as.Date("1985-04-01"),by="1 > month",length=13) ) > > # do the merge adding the "missing" rows (where NA will appear) > dfNew <- merge(dfDates, dfOrig, by="TransitDate", all.x=TRUE ) > > # replace the NA's by zero > dfNew[is.na(dfNew)] <- 0 > > HTH, > Eric > > > On Wed, Nov 1, 2017 at 9:45 PM, Paul Bernal <paulbernal07 at gmail.com> > wrote: > >> Dear R friends, >> >> I am currently working with time series data, and I have a table(as data >> frame) that has looks like this (TransitDate are in format = "%e-%B-%Y") : >> >> TransitDate Transits CargoTons >> 1985-04-01 100 2500 >> 1985-05-01 135 4500 >> 1985-06-01 120 1750 >> 1985-07-01 100 3750 >> 1985-08-01 200 1250 >> >> The problem is, that there are several periods that don?t exist in the >> table, so it has the following behavior: >> >> TransitDate Transits CargoTons >> 1985-04-01 100 1000 >> 1985-07-01 100 1080 >> 1985-12-01 500 3785 >> 1986-04-01 325 4200 >> . >> . >> 2017-09-01 400 2350 (*this is the last observation) >> >> You can see in the last table fragment that the series jumps from >> 1985-04-01 to 1985-07-01, then it jumps from there to 1985-12-01 making >> the >> time series quite irregular (non-constant chronologically speaking). >> >> What I want to do is create a dummy table that has the sequence from the >> first observation (1985-04-01) up to the last one (2017-09-01) and then >> develop a code that checks if the dates contained in the dummy table exist >> in the original table, if they don?t exist then add those dates and put >> zeroes on the fields. >> >> How can I achieve this? >> >> Any help will be greatly appreciated, >> >> Best regards, >> >> Paul >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > >[[alternative HTML version deleted]]
Dear Eric, Hope you are doing great. I also tried the following: #First I created the complete date sequence TransitDateFrame <- data.frame(TransitDate=seq(as.Date(dataset1[1,1]), as.Date(dataset1[nrow(dataset1),1]), by = "month")) #Then I did the merging dataset1NEW <- merge(TransitDateFrame, dataset1, by="TransitDate", all.x=TRUE) Now it has, as expected the total number of rows. The problem is, it filled absolutely everything with NAs, and this shouldn?t be the case since there are dates that actually have data. why is this happening? I am attaching the dataset1 table as a .csv document for your reference. Basically what I want is to bring all the values in dataset1 and only add the dates missing with value 0. Best regards, Paul 2017-11-01 15:21 GMT-05:00 Eric Berger <ericjberger at gmail.com>:> Hi Paul, > > #First I set up some sample data since I don't have a copy of your data > dtOrig <- as.Date( c("1985-04-01","1985-07-01","1985-12-01","1986-04-01")) > dfOrig <- data.frame( TransitDate=dtOrig, Transits=c(100,100,500,325), > CargoTons=c(1000,1080,3785,4200) ) > > #Generate the complete set of dates as a data frame > dfDates<- data.frame( TransitDate=seq(from=as.Date("1985-04-01"),by="1 > month",length=13) ) > > # do the merge adding the "missing" rows (where NA will appear) > dfNew <- merge(dfDates, dfOrig, by="TransitDate", all.x=TRUE ) > > # replace the NA's by zero > dfNew[is.na(dfNew)] <- 0 > > HTH, > Eric > > > On Wed, Nov 1, 2017 at 9:45 PM, Paul Bernal <paulbernal07 at gmail.com> > wrote: > >> Dear R friends, >> >> I am currently working with time series data, and I have a table(as data >> frame) that has looks like this (TransitDate are in format = "%e-%B-%Y") : >> >> TransitDate Transits CargoTons >> 1985-04-01 100 2500 >> 1985-05-01 135 4500 >> 1985-06-01 120 1750 >> 1985-07-01 100 3750 >> 1985-08-01 200 1250 >> >> The problem is, that there are several periods that don?t exist in the >> table, so it has the following behavior: >> >> TransitDate Transits CargoTons >> 1985-04-01 100 1000 >> 1985-07-01 100 1080 >> 1985-12-01 500 3785 >> 1986-04-01 325 4200 >> . >> . >> 2017-09-01 400 2350 (*this is the last observation) >> >> You can see in the last table fragment that the series jumps from >> 1985-04-01 to 1985-07-01, then it jumps from there to 1985-12-01 making >> the >> time series quite irregular (non-constant chronologically speaking). >> >> What I want to do is create a dummy table that has the sequence from the >> first observation (1985-04-01) up to the last one (2017-09-01) and then >> develop a code that checks if the dates contained in the dummy table exist >> in the original table, if they don?t exist then add those dates and put >> zeroes on the fields. >> >> How can I achieve this? >> >> Any help will be greatly appreciated, >> >> Best regards, >> >> Paul >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > >