Paul Wennekes
2012-Oct-11 15:55 UTC
[R] Formatting data for bootstrapping for confidence intervals
Hi all, New to R, so this may be obvious to some. I've been trying to figure this out for a while, I have a dataset "events" that looks something like this: Area NAME DATE X Xn Y 1 X 1/10/10 1 1 0 1 Y 1/11/10 0 0 1 1 X 1/12/10 1 0 0 1 X 1/12/10 1 0 0 1 X 1/12/10 1 0 0 2 X 2/12/10 1 1 0 2 X 2/12/10 1 0 0 2 Y 2/12/10 0 0 1 2 X 2/13/10 1 0 0 2 X 2/13/10 1 0 0 2 X 2/13/10 1 0 0 2 X 2/14/10 1 0 0 2 X 2/14/10 1 0 0 2 X 2/14/10 1 1 0 2 X 2/14/10 1 0 0 3 X 7/27/11 1 0 0 3 X 7/27/11 1 1 0 3 X 7/27/11 1 0 0 3 X 7/28/11 1 0 0 3 X 7/28/11 1 1 0 3 X 7/28/11 1 0 0 3 X 7/28/11 1 0 0 3 Y 7/28/11 0 0 1 3 X 7/28/11 1 0 0 3 X 7/28/11 1 1 0 3 Y 7/28/11 0 0 1 3 X 7/28/11 1 0 0 3 X 7/29/11 1 0 0 3 X 7/29/11 1 0 0 3 X 7/29/11 1 1 0 X and Y are events. Every row represents a single event happening, with a 1 indicating which one happens at that time. Xn indicates X happening at night. I want to bootstrap these events over days but I think I need to summarize them first, ie. get something that looks like this: Area DATE X Xn Y 1 1/10/10 1 1 0 1 1/11/10 0 0 1 1 1/12/10 3 0 0 2 2/12/10 2 1 1 etc. and then for each Area, bootstrap the data over the days. Any ideas? I've tried using the 'reshape' package but I don't know how to sum over parts of the columns as defined by the DATE values... Many thanks ahead! -- View this message in context: http://r.789695.n4.nabble.com/Formatting-data-for-bootstrapping-for-confidence-intervals-tp4645860.html Sent from the R help mailing list archive at Nabble.com.
Rui Barradas
2012-Oct-11 18:35 UTC
[R] Formatting data for bootstrapping for confidence intervals
Hello, To aggregate the data use, yes, it's exists, function aggregate. with(dat, aggregate(cbind(X, Xn, Y), list(Area, DATE), FUN = sum)) # output Group.1 Group.2 X Xn Y 1 1 1/10/10 1 1 0 2 1 1/11/10 0 0 1 3 1 1/12/10 3 0 0 4 2 2/12/10 2 1 1 5 2 2/13/10 3 0 0 6 2 2/14/10 4 1 0 7 3 7/27/11 3 1 0 8 3 7/28/11 7 2 2 9 3 7/29/11 3 1 0 And take a look at package boot. Maybe you'll find something there. Hope this helps, Rui Barradas Em 11-10-2012 16:55, Paul Wennekes escreveu:> Hi all, > > New to R, so this may be obvious to some. > I've been trying to figure this out for a while, I have a dataset "events" > that looks something like this: > > Area NAME DATE X Xn Y > 1 X 1/10/10 1 1 0 > 1 Y 1/11/10 0 0 1 > 1 X 1/12/10 1 0 0 > 1 X 1/12/10 1 0 0 > 1 X 1/12/10 1 0 0 > 2 X 2/12/10 1 1 0 > 2 X 2/12/10 1 0 0 > 2 Y 2/12/10 0 0 1 > 2 X 2/13/10 1 0 0 > 2 X 2/13/10 1 0 0 > 2 X 2/13/10 1 0 0 > 2 X 2/14/10 1 0 0 > 2 X 2/14/10 1 0 0 > 2 X 2/14/10 1 1 0 > 2 X 2/14/10 1 0 0 > 3 X 7/27/11 1 0 0 > 3 X 7/27/11 1 1 0 > 3 X 7/27/11 1 0 0 > 3 X 7/28/11 1 0 0 > 3 X 7/28/11 1 1 0 > 3 X 7/28/11 1 0 0 > 3 X 7/28/11 1 0 0 > 3 Y 7/28/11 0 0 1 > 3 X 7/28/11 1 0 0 > 3 X 7/28/11 1 1 0 > 3 Y 7/28/11 0 0 1 > 3 X 7/28/11 1 0 0 > 3 X 7/29/11 1 0 0 > 3 X 7/29/11 1 0 0 > 3 X 7/29/11 1 1 0 > > X and Y are events. Every row represents a single event happening, with a 1 > indicating which one happens at that time. Xn indicates X happening at > night. I want to bootstrap these events over days but I think I need to > summarize them first, ie. get something that looks like this: > > Area DATE X Xn Y > 1 1/10/10 1 1 0 > 1 1/11/10 0 0 1 > 1 1/12/10 3 0 0 > 2 2/12/10 2 1 1 > etc. > > and then for each Area, bootstrap the data over the days. Any ideas? I've > tried using the 'reshape' package but I don't know how to sum over parts of > the columns as defined by the DATE values... > > Many thanks ahead! > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Formatting-data-for-bootstrapping-for-confidence-intervals-tp4645860.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Paul Wennekes
2012-Oct-11 22:11 UTC
[R] Formatting data for bootstrapping for confidence intervals
Thank you! That had me stuck for quite a while and this worked like a charm! -- View this message in context: http://r.789695.n4.nabble.com/Formatting-data-for-bootstrapping-for-confidence-intervals-tp4645860p4645920.html Sent from the R help mailing list archive at Nabble.com.
Hi,
Try this:
dat1<-read.table(text="
Area??? NAME??? DATE??? X??? Xn??? Y
1??????????? X??? 1/10/10??????????? 1??? 1??? 0
1??????????? Y??? 1/11/10??????????? 0??? 0??? 1
1??????????? X??? 1/12/10??????????? 1??? 0??? 0
1??????????? X??? 1/12/10??????????? 1??? 0??? 0
1??????????? X??? 1/12/10??????????? 1??? 0??? 0
2??????????? X??? 2/12/10??????????? 1??? 1??? 0
2??????????? X??? 2/12/10??????????? 1??? 0??? 0
2??????????? Y??? 2/12/10??????????? 0??? 0??? 1
2??????????? X??? 2/13/10??????????? 1??? 0??? 0
2??????????? X??? 2/13/10??????????? 1??? 0??? 0
2??????????? X??? 2/13/10??????????? 1??? 0??? 0
2??????????? X??? 2/14/10??????????? 1??? 0??? 0
2??????????? X??? 2/14/10??????????? 1??? 0??? 0
2??????????? X??? 2/14/10??????????? 1??? 1??? 0
2??????????? X??? 2/14/10??????????? 1??? 0??? 0
3??????????? X??? 7/27/11??????????? 1??? 0??? 0
3??????????? X??? 7/27/11??????????? 1??? 1??? 0
3??????????? X??? 7/27/11??????????? 1??? 0??? 0
3??????????? X??? 7/28/11??????????? 1??? 0??? 0
3??????????? X??? 7/28/11??????????? 1??? 1??? 0
3??????????? X??? 7/28/11??????????? 1??? 0??? 0
3??????????? X??? 7/28/11??????????? 1??? 0??? 0
3??????????? Y??? 7/28/11??????????? 0??? 0??? 1
3??????????? X??? 7/28/11??????????? 1??? 0??? 0
3??????????? X??? 7/28/11??????????? 1??? 1??? 0
3??????????? Y??? 7/28/11??????????? 0??? 0??? 1
3??????????? X??? 7/28/11??????????? 1??? 0??? 0
3??????????? X??? 7/29/11??????????? 1??? 0??? 0
3??????????? X??? 7/29/11??????????? 1??? 0??? 0
3??????????? X??? 7/29/11??????????? 1??? 1??? 0
",sep="",header=TRUE,stringsAsFactors=FALSE)
#You can either use aggregate(), ddply() from library(plyr) or using
library(data.table)
library(data.table)
dat2<-data.table(dat1)
dat2[,list(X=sum(X),Xn=sum(Xn),Y=sum(Y)),list(Area,DATE)]
#?? Area??? DATE X Xn Y
#1:??? 1 1/10/10 1? 1 0
#2:??? 1 1/11/10 0? 0 1
#3:??? 1 1/12/10 3? 0 0
#4:??? 2 2/12/10 2? 1 1
#5:??? 2 2/13/10 3? 0 0
#6:??? 2 2/14/10 4? 1 0
#7:??? 3 7/27/11 3? 1 0
#8:??? 3 7/28/11 7? 2 2
#9:??? 3 7/29/11 3? 1 0
library(plyr)
ddply(dat1,.(Area,DATE),colwise(sum,c("X","Xn","Y")))
# Area??? DATE X Xn Y
#1??? 1 1/10/10 1? 1 0
#2??? 1 1/11/10 0? 0 1
#3??? 1 1/12/10 3? 0 0
#4??? 2 2/12/10 2? 1 1
#5??? 2 2/13/10 3? 0 0
#6??? 2 2/14/10 4? 1 0
#7??? 3 7/27/11 3? 1 0
#8??? 3 7/28/11 7? 2 2
#9??? 3 7/29/11 3? 1 0
A.K.
----- Original Message -----
From: Paul Wennekes <paul.wennekes at evobio.eu>
To: r-help at r-project.org
Cc:
Sent: Thursday, October 11, 2012 11:55 AM
Subject: [R] Formatting data for bootstrapping for confidence intervals
Hi all,
New to R, so this may be obvious to some.
I've been trying to figure this out for a while, I have a dataset
"events"
that looks something like this:
Area??? NAME??? DATE??? X??? Xn??? Y
1??? ? ? ? ? X??? 1/10/10??? ? ? ? ? 1??? 1??? 0
1??? ? ? ? ? Y??? 1/11/10??? ? ? ? ? 0??? 0??? 1
1??? ? ? ? ? X??? 1/12/10??? ? ? ? ? 1??? 0??? 0
1? ? ? ? ??? X??? 1/12/10??? ? ? ? ? 1??? 0??? 0
1??? ? ? ? ? X??? 1/12/10??? ? ? ? ? 1??? 0??? 0
2??? ? ? ? ? X??? 2/12/10??? ? ? ? ? 1??? 1??? 0
2??? ? ? ? ? X??? 2/12/10??? ? ? ? ? 1??? 0??? 0
2??? ? ? ? ? Y??? 2/12/10??? ? ? ? ? 0??? 0??? 1
2??? ? ? ? ? X??? 2/13/10??? ? ? ? ? 1??? 0??? 0
2??? ? ? ? ? X??? 2/13/10??? ? ? ? ? 1??? 0??? 0
2??? ? ? ? ? X??? 2/13/10??? ? ? ? ? 1??? 0??? 0
2??? ? ? ? ? X??? 2/14/10??? ? ? ? ? 1??? 0??? 0
2??? ? ? ? ? X??? 2/14/10??? ? ? ? ? 1??? 0??? 0
2??? ? ? ? ? X??? 2/14/10??? ? ? ? ? 1??? 1??? 0
2??? ? ? ? ? X??? 2/14/10??? ? ? ? ? 1??? 0??? 0
3? ? ? ? ??? X??? 7/27/11??? ? ? ? ? 1??? 0??? 0
3??? ? ? ? ? X??? 7/27/11??? ? ? ? ? 1??? 1??? 0
3??? ? ? ? ? X??? 7/27/11??? ? ? ? ? 1??? 0??? 0
3??? ? ? ? ? X??? 7/28/11??? ? ? ? ? 1??? 0??? 0
3??? ? ? ? ? X??? 7/28/11??? ? ? ? ? 1??? 1??? 0
3??? ? ? ? ? X??? 7/28/11??? ? ? ? ? 1??? 0??? 0
3??? ? ? ? ? X??? 7/28/11??? ? ? ? ? 1??? 0??? 0
3??? ? ? ? ? Y??? 7/28/11??? ? ? ? ? 0??? 0??? 1
3??? ? ? ? ? X??? 7/28/11??? ? ? ? ? 1??? 0??? 0
3??? ? ? ? ? X??? 7/28/11??? ? ? ? ? 1??? 1??? 0
3??? ? ? ? ? Y??? 7/28/11??? ? ? ? ? 0??? 0??? 1
3??? ? ? ? ? X??? 7/28/11??? ? ? ? ? 1??? 0??? 0
3??? ? ? ? ? X??? 7/29/11??? ? ? ? ? 1??? 0??? 0
3??? ? ? ? ? X??? 7/29/11??? ? ? ? ? 1??? 0??? 0
3??? ? ? ? ? X??? 7/29/11??? ? ? ? ? 1??? 1??? 0
X and Y are events. Every row represents a single event happening, with a 1
indicating which one happens at that time. Xn indicates X happening at
night. I want to bootstrap these events over days but I think I need to
summarize them first, ie. get something that looks like this:
Area??? ??? DATE??? ? ? ? ? X??? Xn??? Y
1??? ? ? ? ? ??? 1/10/10??? ? ? ? ? 1??? 1??? 0
1??? ? ? ? ? ??? 1/11/10??? ? ? ? ? 0??? 0??? 1
1??? ? ? ? ? ??? 1/12/10??? ? ? ? ? 3??? 0??? 0
2??? ? ? ? ? ??? 2/12/10??? ? ? ? ? 2??? 1??? 1
etc.
and then for each Area, bootstrap the data over the days. Any ideas? I've
tried using the 'reshape' package but I don't know how to sum over
parts of
the columns as defined by the DATE values...
Many thanks ahead!
--
View this message in context:
http://r.789695.n4.nabble.com/Formatting-data-for-bootstrapping-for-confidence-intervals-tp4645860.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.