Paul Wennekes
2012-Oct-11 15:55 UTC
[R] Formatting data for bootstrapping for confidence intervals
Hi all, New to R, so this may be obvious to some. I've been trying to figure this out for a while, I have a dataset "events" that looks something like this: Area NAME DATE X Xn Y 1 X 1/10/10 1 1 0 1 Y 1/11/10 0 0 1 1 X 1/12/10 1 0 0 1 X 1/12/10 1 0 0 1 X 1/12/10 1 0 0 2 X 2/12/10 1 1 0 2 X 2/12/10 1 0 0 2 Y 2/12/10 0 0 1 2 X 2/13/10 1 0 0 2 X 2/13/10 1 0 0 2 X 2/13/10 1 0 0 2 X 2/14/10 1 0 0 2 X 2/14/10 1 0 0 2 X 2/14/10 1 1 0 2 X 2/14/10 1 0 0 3 X 7/27/11 1 0 0 3 X 7/27/11 1 1 0 3 X 7/27/11 1 0 0 3 X 7/28/11 1 0 0 3 X 7/28/11 1 1 0 3 X 7/28/11 1 0 0 3 X 7/28/11 1 0 0 3 Y 7/28/11 0 0 1 3 X 7/28/11 1 0 0 3 X 7/28/11 1 1 0 3 Y 7/28/11 0 0 1 3 X 7/28/11 1 0 0 3 X 7/29/11 1 0 0 3 X 7/29/11 1 0 0 3 X 7/29/11 1 1 0 X and Y are events. Every row represents a single event happening, with a 1 indicating which one happens at that time. Xn indicates X happening at night. I want to bootstrap these events over days but I think I need to summarize them first, ie. get something that looks like this: Area DATE X Xn Y 1 1/10/10 1 1 0 1 1/11/10 0 0 1 1 1/12/10 3 0 0 2 2/12/10 2 1 1 etc. and then for each Area, bootstrap the data over the days. Any ideas? I've tried using the 'reshape' package but I don't know how to sum over parts of the columns as defined by the DATE values... Many thanks ahead! -- View this message in context: http://r.789695.n4.nabble.com/Formatting-data-for-bootstrapping-for-confidence-intervals-tp4645860.html Sent from the R help mailing list archive at Nabble.com.
Rui Barradas
2012-Oct-11 18:35 UTC
[R] Formatting data for bootstrapping for confidence intervals
Hello, To aggregate the data use, yes, it's exists, function aggregate. with(dat, aggregate(cbind(X, Xn, Y), list(Area, DATE), FUN = sum)) # output Group.1 Group.2 X Xn Y 1 1 1/10/10 1 1 0 2 1 1/11/10 0 0 1 3 1 1/12/10 3 0 0 4 2 2/12/10 2 1 1 5 2 2/13/10 3 0 0 6 2 2/14/10 4 1 0 7 3 7/27/11 3 1 0 8 3 7/28/11 7 2 2 9 3 7/29/11 3 1 0 And take a look at package boot. Maybe you'll find something there. Hope this helps, Rui Barradas Em 11-10-2012 16:55, Paul Wennekes escreveu:> Hi all, > > New to R, so this may be obvious to some. > I've been trying to figure this out for a while, I have a dataset "events" > that looks something like this: > > Area NAME DATE X Xn Y > 1 X 1/10/10 1 1 0 > 1 Y 1/11/10 0 0 1 > 1 X 1/12/10 1 0 0 > 1 X 1/12/10 1 0 0 > 1 X 1/12/10 1 0 0 > 2 X 2/12/10 1 1 0 > 2 X 2/12/10 1 0 0 > 2 Y 2/12/10 0 0 1 > 2 X 2/13/10 1 0 0 > 2 X 2/13/10 1 0 0 > 2 X 2/13/10 1 0 0 > 2 X 2/14/10 1 0 0 > 2 X 2/14/10 1 0 0 > 2 X 2/14/10 1 1 0 > 2 X 2/14/10 1 0 0 > 3 X 7/27/11 1 0 0 > 3 X 7/27/11 1 1 0 > 3 X 7/27/11 1 0 0 > 3 X 7/28/11 1 0 0 > 3 X 7/28/11 1 1 0 > 3 X 7/28/11 1 0 0 > 3 X 7/28/11 1 0 0 > 3 Y 7/28/11 0 0 1 > 3 X 7/28/11 1 0 0 > 3 X 7/28/11 1 1 0 > 3 Y 7/28/11 0 0 1 > 3 X 7/28/11 1 0 0 > 3 X 7/29/11 1 0 0 > 3 X 7/29/11 1 0 0 > 3 X 7/29/11 1 1 0 > > X and Y are events. Every row represents a single event happening, with a 1 > indicating which one happens at that time. Xn indicates X happening at > night. I want to bootstrap these events over days but I think I need to > summarize them first, ie. get something that looks like this: > > Area DATE X Xn Y > 1 1/10/10 1 1 0 > 1 1/11/10 0 0 1 > 1 1/12/10 3 0 0 > 2 2/12/10 2 1 1 > etc. > > and then for each Area, bootstrap the data over the days. Any ideas? I've > tried using the 'reshape' package but I don't know how to sum over parts of > the columns as defined by the DATE values... > > Many thanks ahead! > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Formatting-data-for-bootstrapping-for-confidence-intervals-tp4645860.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Paul Wennekes
2012-Oct-11 22:11 UTC
[R] Formatting data for bootstrapping for confidence intervals
Thank you! That had me stuck for quite a while and this worked like a charm! -- View this message in context: http://r.789695.n4.nabble.com/Formatting-data-for-bootstrapping-for-confidence-intervals-tp4645860p4645920.html Sent from the R help mailing list archive at Nabble.com.
Hi, Try this: dat1<-read.table(text=" Area??? NAME??? DATE??? X??? Xn??? Y 1??????????? X??? 1/10/10??????????? 1??? 1??? 0 1??????????? Y??? 1/11/10??????????? 0??? 0??? 1 1??????????? X??? 1/12/10??????????? 1??? 0??? 0 1??????????? X??? 1/12/10??????????? 1??? 0??? 0 1??????????? X??? 1/12/10??????????? 1??? 0??? 0 2??????????? X??? 2/12/10??????????? 1??? 1??? 0 2??????????? X??? 2/12/10??????????? 1??? 0??? 0 2??????????? Y??? 2/12/10??????????? 0??? 0??? 1 2??????????? X??? 2/13/10??????????? 1??? 0??? 0 2??????????? X??? 2/13/10??????????? 1??? 0??? 0 2??????????? X??? 2/13/10??????????? 1??? 0??? 0 2??????????? X??? 2/14/10??????????? 1??? 0??? 0 2??????????? X??? 2/14/10??????????? 1??? 0??? 0 2??????????? X??? 2/14/10??????????? 1??? 1??? 0 2??????????? X??? 2/14/10??????????? 1??? 0??? 0 3??????????? X??? 7/27/11??????????? 1??? 0??? 0 3??????????? X??? 7/27/11??????????? 1??? 1??? 0 3??????????? X??? 7/27/11??????????? 1??? 0??? 0 3??????????? X??? 7/28/11??????????? 1??? 0??? 0 3??????????? X??? 7/28/11??????????? 1??? 1??? 0 3??????????? X??? 7/28/11??????????? 1??? 0??? 0 3??????????? X??? 7/28/11??????????? 1??? 0??? 0 3??????????? Y??? 7/28/11??????????? 0??? 0??? 1 3??????????? X??? 7/28/11??????????? 1??? 0??? 0 3??????????? X??? 7/28/11??????????? 1??? 1??? 0 3??????????? Y??? 7/28/11??????????? 0??? 0??? 1 3??????????? X??? 7/28/11??????????? 1??? 0??? 0 3??????????? X??? 7/29/11??????????? 1??? 0??? 0 3??????????? X??? 7/29/11??????????? 1??? 0??? 0 3??????????? X??? 7/29/11??????????? 1??? 1??? 0 ",sep="",header=TRUE,stringsAsFactors=FALSE) #You can either use aggregate(), ddply() from library(plyr) or using library(data.table) library(data.table) dat2<-data.table(dat1) dat2[,list(X=sum(X),Xn=sum(Xn),Y=sum(Y)),list(Area,DATE)] #?? Area??? DATE X Xn Y #1:??? 1 1/10/10 1? 1 0 #2:??? 1 1/11/10 0? 0 1 #3:??? 1 1/12/10 3? 0 0 #4:??? 2 2/12/10 2? 1 1 #5:??? 2 2/13/10 3? 0 0 #6:??? 2 2/14/10 4? 1 0 #7:??? 3 7/27/11 3? 1 0 #8:??? 3 7/28/11 7? 2 2 #9:??? 3 7/29/11 3? 1 0 library(plyr) ddply(dat1,.(Area,DATE),colwise(sum,c("X","Xn","Y"))) # Area??? DATE X Xn Y #1??? 1 1/10/10 1? 1 0 #2??? 1 1/11/10 0? 0 1 #3??? 1 1/12/10 3? 0 0 #4??? 2 2/12/10 2? 1 1 #5??? 2 2/13/10 3? 0 0 #6??? 2 2/14/10 4? 1 0 #7??? 3 7/27/11 3? 1 0 #8??? 3 7/28/11 7? 2 2 #9??? 3 7/29/11 3? 1 0 A.K. ----- Original Message ----- From: Paul Wennekes <paul.wennekes at evobio.eu> To: r-help at r-project.org Cc: Sent: Thursday, October 11, 2012 11:55 AM Subject: [R] Formatting data for bootstrapping for confidence intervals Hi all, New to R, so this may be obvious to some. I've been trying to figure this out for a while, I have a dataset "events" that looks something like this: Area??? NAME??? DATE??? X??? Xn??? Y 1??? ? ? ? ? X??? 1/10/10??? ? ? ? ? 1??? 1??? 0 1??? ? ? ? ? Y??? 1/11/10??? ? ? ? ? 0??? 0??? 1 1??? ? ? ? ? X??? 1/12/10??? ? ? ? ? 1??? 0??? 0 1? ? ? ? ??? X??? 1/12/10??? ? ? ? ? 1??? 0??? 0 1??? ? ? ? ? X??? 1/12/10??? ? ? ? ? 1??? 0??? 0 2??? ? ? ? ? X??? 2/12/10??? ? ? ? ? 1??? 1??? 0 2??? ? ? ? ? X??? 2/12/10??? ? ? ? ? 1??? 0??? 0 2??? ? ? ? ? Y??? 2/12/10??? ? ? ? ? 0??? 0??? 1 2??? ? ? ? ? X??? 2/13/10??? ? ? ? ? 1??? 0??? 0 2??? ? ? ? ? X??? 2/13/10??? ? ? ? ? 1??? 0??? 0 2??? ? ? ? ? X??? 2/13/10??? ? ? ? ? 1??? 0??? 0 2??? ? ? ? ? X??? 2/14/10??? ? ? ? ? 1??? 0??? 0 2??? ? ? ? ? X??? 2/14/10??? ? ? ? ? 1??? 0??? 0 2??? ? ? ? ? X??? 2/14/10??? ? ? ? ? 1??? 1??? 0 2??? ? ? ? ? X??? 2/14/10??? ? ? ? ? 1??? 0??? 0 3? ? ? ? ??? X??? 7/27/11??? ? ? ? ? 1??? 0??? 0 3??? ? ? ? ? X??? 7/27/11??? ? ? ? ? 1??? 1??? 0 3??? ? ? ? ? X??? 7/27/11??? ? ? ? ? 1??? 0??? 0 3??? ? ? ? ? X??? 7/28/11??? ? ? ? ? 1??? 0??? 0 3??? ? ? ? ? X??? 7/28/11??? ? ? ? ? 1??? 1??? 0 3??? ? ? ? ? X??? 7/28/11??? ? ? ? ? 1??? 0??? 0 3??? ? ? ? ? X??? 7/28/11??? ? ? ? ? 1??? 0??? 0 3??? ? ? ? ? Y??? 7/28/11??? ? ? ? ? 0??? 0??? 1 3??? ? ? ? ? X??? 7/28/11??? ? ? ? ? 1??? 0??? 0 3??? ? ? ? ? X??? 7/28/11??? ? ? ? ? 1??? 1??? 0 3??? ? ? ? ? Y??? 7/28/11??? ? ? ? ? 0??? 0??? 1 3??? ? ? ? ? X??? 7/28/11??? ? ? ? ? 1??? 0??? 0 3??? ? ? ? ? X??? 7/29/11??? ? ? ? ? 1??? 0??? 0 3??? ? ? ? ? X??? 7/29/11??? ? ? ? ? 1??? 0??? 0 3??? ? ? ? ? X??? 7/29/11??? ? ? ? ? 1??? 1??? 0 X and Y are events. Every row represents a single event happening, with a 1 indicating which one happens at that time. Xn indicates X happening at night. I want to bootstrap these events over days but I think I need to summarize them first, ie. get something that looks like this: Area??? ??? DATE??? ? ? ? ? X??? Xn??? Y 1??? ? ? ? ? ??? 1/10/10??? ? ? ? ? 1??? 1??? 0 1??? ? ? ? ? ??? 1/11/10??? ? ? ? ? 0??? 0??? 1 1??? ? ? ? ? ??? 1/12/10??? ? ? ? ? 3??? 0??? 0 2??? ? ? ? ? ??? 2/12/10??? ? ? ? ? 2??? 1??? 1 etc. and then for each Area, bootstrap the data over the days. Any ideas? I've tried using the 'reshape' package but I don't know how to sum over parts of the columns as defined by the DATE values... Many thanks ahead! -- View this message in context: http://r.789695.n4.nabble.com/Formatting-data-for-bootstrapping-for-confidence-intervals-tp4645860.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.