Dear R People: So thanks to your help, I have the following:> dog3.df <- read.delim("c:/Users/erin/Documents/dog1.txt",header=FALSE,sep="\t") > dog3.dfV1 V2 1 1/1/2000 dog 2 1/1/2000 cat 3 1/1/2000 tree 4 1/1/2000 dog 5 1/2/2000 cat 6 1/2/2000 cat 7 1/2/2000 cat 8 1/2/2000 tree 9 1/3/2000 dog 10 1/3/2000 tree 11 1/6/2000 dog 12 1/6/2000 cat> dog3.df$V1 <- as.Date(dog3.df$V1,"%m/%d/%Y") > DF3 <- with(dog3.df,data.frame(Date=V1,V2,1)) > library(reshape) > cast(formula=Date~V2,data=DF3,value="X1",fill=0)Aggregation requires fun.aggregate: length used as default Date cat dog tree 1 2000-01-01 1 2 1 2 2000-01-02 3 0 1 3 2000-01-03 0 1 1 4 2000-01-06 1 1 0>So far, so good. My new question: Can I fill in the days which are "missing"; i.e., 2000-01-04 and 2000-01-05, with zeros for each set, please? thanks, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodgess at gmail.com
Try this: xtabs( ~ V1 + V2, transform(dog3.df, V1 = factor(V1, levels as.character(seq(min(dog3.df$V1), max(dog3.df$V1), by = "days"))))) On Tue, Jun 8, 2010 at 4:52 PM, Erin Hodgess <erinm.hodgess@gmail.com>wrote:> Dear R People: > > So thanks to your help, I have the following: > > > > dog3.df <- > read.delim("c:/Users/erin/Documents/dog1.txt",header=FALSE,sep="\t") > > dog3.df > V1 V2 > 1 1/1/2000 dog > 2 1/1/2000 cat > 3 1/1/2000 tree > 4 1/1/2000 dog > 5 1/2/2000 cat > 6 1/2/2000 cat > 7 1/2/2000 cat > 8 1/2/2000 tree > 9 1/3/2000 dog > 10 1/3/2000 tree > 11 1/6/2000 dog > 12 1/6/2000 cat > > dog3.df$V1 <- as.Date(dog3.df$V1,"%m/%d/%Y") > > DF3 <- with(dog3.df,data.frame(Date=V1,V2,1)) > > library(reshape) > > cast(formula=Date~V2,data=DF3,value="X1",fill=0) > Aggregation requires fun.aggregate: length used as default > Date cat dog tree > 1 2000-01-01 1 2 1 > 2 2000-01-02 3 0 1 > 3 2000-01-03 0 1 1 > 4 2000-01-06 1 1 0 > > > > So far, so good. My new question: Can I fill in the days which are > "missing"; i.e., 2000-01-04 and 2000-01-05, with zeros for each set, > please? > > thanks, > Erin > > -- > Erin Hodgess > Associate Professor > Department of Computer and Mathematical Sciences > University of Houston - Downtown > mailto: erinm.hodgess@gmail.com > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]
Once again my message got held up for moderator approval so I am deleting it and trying again. Hopefully this one goes through. In general, we will get the simplest usage if we match the problem to the appropriate OO class. In this case we are using time series so it is advantageous to use a time series class, i.e. zoo, instead of data frames. We can use data frames but then each time we run into a problem that would be trivial with time series we have to reinvent the wheel all over again. We read the data into a data frame, append a column of ones and then read it into zoo, converting the index to Date class with the indicated format, splitting it on column 2 and aggregating using sum (since unlike the prior example we now have duplicate dates within cat and also within dog). See ?read.zoo for more. To fill in the dates we just convert the zoo series to ts and back again. This loses the Date class (since ts has no notion of index class) but we can put it back again. Since this fills the newly added entries with NAs we replace the NAs with zeros. Lines <- "V1 V2 1 1/1/2000 dog 2 1/1/2000 cat 3 1/1/2000 tree 4 1/1/2000 dog 5 1/2/2000 cat 6 1/2/2000 cat 7 1/2/2000 cat 8 1/2/2000 tree 9 1/3/2000 dog 10 1/3/2000 tree 11 1/6/2000 dog 12 1/6/2000 cat" library(zoo) source("http://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/zoo/R/read.zoo.R?revision=719&root=zoo") DF <- read.table(textConnection(Lines)) z <- read.zoo(cbind(DF, 1), format = "%m/%d/%Y", split = 2, aggregate = sum) zz <- as.zoo(as.ts(z)) time(zz) <- as.Date(time(zz)) zz[is.na(zz)] <- 0 zz plot(zz)
Here is one way ... DF4 <- cast(formula=Date~V2,data=DF3,value="X1",fill=0) d <- with(DF4, seq(min(Date), max(Date), by = 1)) ### full set m <- as.Date(setdiff(d, DF4$Date)) ### missing dates if(length(m) > 0) { extras <- cbind(data.frame(Date = m), cat = 0, dog = 0, tree = 0) DF4 <- rbind(DF4, extras) rm(extras) DF4 <- DF4[order(DF4$Date), ] } rm(d, m) ### clean up ... Bill. -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Erin Hodgess Sent: Wednesday, 9 June 2010 5:52 AM To: R help Subject: [R] more dates and data frames Dear R People: So thanks to your help, I have the following:> dog3.df <- read.delim("c:/Users/erin/Documents/dog1.txt",header=FALSE,sep="\t") > dog3.dfV1 V2 1 1/1/2000 dog 2 1/1/2000 cat 3 1/1/2000 tree 4 1/1/2000 dog 5 1/2/2000 cat 6 1/2/2000 cat 7 1/2/2000 cat 8 1/2/2000 tree 9 1/3/2000 dog 10 1/3/2000 tree 11 1/6/2000 dog 12 1/6/2000 cat> dog3.df$V1 <- as.Date(dog3.df$V1,"%m/%d/%Y") > DF3 <- with(dog3.df,data.frame(Date=V1,V2,1)) > library(reshape) > cast(formula=Date~V2,data=DF3,value="X1",fill=0)Aggregation requires fun.aggregate: length used as default Date cat dog tree 1 2000-01-01 1 2 1 2 2000-01-02 3 0 1 3 2000-01-03 0 1 1 4 2000-01-06 1 1 0>So far, so good. My new question: Can I fill in the days which are "missing"; i.e., 2000-01-04 and 2000-01-05, with zeros for each set, please? thanks, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodgess at gmail.com ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.