Dear R People: So thanks to your help, I have the following:> dog3.df <- read.delim("c:/Users/erin/Documents/dog1.txt",header=FALSE,sep="\t") > dog3.dfV1 V2 1 1/1/2000 dog 2 1/1/2000 cat 3 1/1/2000 tree 4 1/1/2000 dog 5 1/2/2000 cat 6 1/2/2000 cat 7 1/2/2000 cat 8 1/2/2000 tree 9 1/3/2000 dog 10 1/3/2000 tree 11 1/6/2000 dog 12 1/6/2000 cat> dog3.df$V1 <- as.Date(dog3.df$V1,"%m/%d/%Y") > DF3 <- with(dog3.df,data.frame(Date=V1,V2,1)) > library(reshape) > cast(formula=Date~V2,data=DF3,value="X1",fill=0)Aggregation requires fun.aggregate: length used as default Date cat dog tree 1 2000-01-01 1 2 1 2 2000-01-02 3 0 1 3 2000-01-03 0 1 1 4 2000-01-06 1 1 0>So far, so good. My new question: Can I fill in the days which are "missing"; i.e., 2000-01-04 and 2000-01-05, with zeros for each set, please? thanks, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodgess at gmail.com
Try this: xtabs( ~ V1 + V2, transform(dog3.df, V1 = factor(V1, levels as.character(seq(min(dog3.df$V1), max(dog3.df$V1), by = "days"))))) On Tue, Jun 8, 2010 at 4:52 PM, Erin Hodgess <erinm.hodgess@gmail.com>wrote:> Dear R People: > > So thanks to your help, I have the following: > > > > dog3.df <- > read.delim("c:/Users/erin/Documents/dog1.txt",header=FALSE,sep="\t") > > dog3.df > V1 V2 > 1 1/1/2000 dog > 2 1/1/2000 cat > 3 1/1/2000 tree > 4 1/1/2000 dog > 5 1/2/2000 cat > 6 1/2/2000 cat > 7 1/2/2000 cat > 8 1/2/2000 tree > 9 1/3/2000 dog > 10 1/3/2000 tree > 11 1/6/2000 dog > 12 1/6/2000 cat > > dog3.df$V1 <- as.Date(dog3.df$V1,"%m/%d/%Y") > > DF3 <- with(dog3.df,data.frame(Date=V1,V2,1)) > > library(reshape) > > cast(formula=Date~V2,data=DF3,value="X1",fill=0) > Aggregation requires fun.aggregate: length used as default > Date cat dog tree > 1 2000-01-01 1 2 1 > 2 2000-01-02 3 0 1 > 3 2000-01-03 0 1 1 > 4 2000-01-06 1 1 0 > > > > So far, so good. My new question: Can I fill in the days which are > "missing"; i.e., 2000-01-04 and 2000-01-05, with zeros for each set, > please? > > thanks, > Erin > > -- > Erin Hodgess > Associate Professor > Department of Computer and Mathematical Sciences > University of Houston - Downtown > mailto: erinm.hodgess@gmail.com > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]
Once again my message got held up for moderator approval so I
am deleting it and trying again. Hopefully this one goes through.
In general, we will get the simplest usage if we match the problem to
the appropriate OO class. In this case we are using time series so it
is advantageous to use a time series class, i.e. zoo, instead of data
frames. We can use data frames but then each time we run into a
problem that would be trivial with time series we have to reinvent the
wheel all over again.
We read the data into a data frame, append a column of ones and then
read it into zoo, converting the index to Date class with the
indicated format, splitting it on column 2 and aggregating using sum
(since unlike the prior example we now have duplicate dates within cat
and also within dog). See ?read.zoo for more.
To fill in the dates we just convert the zoo series to ts and back
again. This loses the Date class (since ts has no notion of index
class) but we can put it back again. Since this fills the newly added
entries with NAs we replace the NAs with zeros.
Lines <- "V1 V2
1 1/1/2000 dog
2 1/1/2000 cat
3 1/1/2000 tree
4 1/1/2000 dog
5 1/2/2000 cat
6 1/2/2000 cat
7 1/2/2000 cat
8 1/2/2000 tree
9 1/3/2000 dog
10 1/3/2000 tree
11 1/6/2000 dog
12 1/6/2000 cat"
library(zoo)
source("http://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/zoo/R/read.zoo.R?revision=719&root=zoo")
DF <- read.table(textConnection(Lines))
z <- read.zoo(cbind(DF, 1), format = "%m/%d/%Y", split = 2,
aggregate = sum)
zz <- as.zoo(as.ts(z))
time(zz) <- as.Date(time(zz))
zz[is.na(zz)] <- 0
zz
plot(zz)
Here is one way
...
DF4 <- cast(formula=Date~V2,data=DF3,value="X1",fill=0)
d <- with(DF4, seq(min(Date), max(Date), by = 1)) ### full set
m <- as.Date(setdiff(d, DF4$Date)) ### missing dates
if(length(m) > 0) {
extras <- cbind(data.frame(Date = m), cat = 0, dog = 0, tree = 0)
DF4 <- rbind(DF4, extras)
rm(extras)
DF4 <- DF4[order(DF4$Date), ]
}
rm(d, m) ### clean up
...
Bill.
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Erin Hodgess
Sent: Wednesday, 9 June 2010 5:52 AM
To: R help
Subject: [R] more dates and data frames
Dear R People:
So thanks to your help, I have the following:
> dog3.df <-
read.delim("c:/Users/erin/Documents/dog1.txt",header=FALSE,sep="\t")
> dog3.df
V1 V2
1 1/1/2000 dog
2 1/1/2000 cat
3 1/1/2000 tree
4 1/1/2000 dog
5 1/2/2000 cat
6 1/2/2000 cat
7 1/2/2000 cat
8 1/2/2000 tree
9 1/3/2000 dog
10 1/3/2000 tree
11 1/6/2000 dog
12 1/6/2000 cat> dog3.df$V1 <- as.Date(dog3.df$V1,"%m/%d/%Y")
> DF3 <- with(dog3.df,data.frame(Date=V1,V2,1))
> library(reshape)
> cast(formula=Date~V2,data=DF3,value="X1",fill=0)
Aggregation requires fun.aggregate: length used as default
Date cat dog tree
1 2000-01-01 1 2 1
2 2000-01-02 3 0 1
3 2000-01-03 0 1 1
4 2000-01-06 1 1 0>
So far, so good. My new question: Can I fill in the days which are
"missing"; i.e., 2000-01-04 and 2000-01-05, with zeros for each set,
please?
thanks,
Erin
--
Erin Hodgess
Associate Professor
Department of Computer and Mathematical Sciences
University of Houston - Downtown
mailto: erinm.hodgess at gmail.com
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.