Dear R users, I came up to a problem by taking means (or other summary statistics) of a big dataframe. Suppose we do have a dataframe: ID V1 V2 V3 V4 ........................ V71 1 6 5 3 2 ........................ 3 2 3 2 2 1 ........................ 1 3 6 5 3 2 ........................ 3 4 12 15 3 2 ........................ 100 ........................................................ ........................................................ 288 10 20 30 30 .......................... 499 I need to find out the way, how to calculate a mean of every 12 lines to get: V1 V2 V3 V4 ........................... V71 mean from 1 to 7 same as V1 same as V1 mean from 8 to 14 same as V1 same as V1 etc. I can do it column by column using: y.ts <- ts(y$V1, frequency=12) aggregate(y.ts, FUN=mean) Bu this is a hardcore... Can anyone suggest a better way to compute all the dataframe at once and get a result as matrix? Thank you in advance! -- Simonas Kecorius ** [[alternative HTML version deleted]]
HI, May be this helps: dat1<-read.table(text=" ID? V1? V2? V3? V4 1??? 6??? 5??? 3??? 2 2??? 3??? 2??? 2??? 1? 3??? 6??? 5??? 3??? 2 4??? 12? 15? 3??? 2? 5??? 6??? 8??? 3??? 2 6??? 3??? 2??? 4??? 1? 7??? 6??? 5??? 3??? 3 8??? 12? 15? 3??? 1? 9??? 6??? 5??? 3??? 3 10??? 3??? 2??? 7??? 5? 11??? 6??? 5??? 8??? 2 12??? 12? 19? 3??? 2? 13??? 6??? 5??? 3??? 2 14??? 3??? 4??? 2??? 1? 15??? 6??? 5??? 6??? 2 16??? 12? 15? 5??? 2? 17??? 6??? 5??? 5??? 2 18??? 3??? 2??? 8??? 1? 19??? 6??? 5??? 3??? 9 20??? 12? 15? 3??? 10? 21??? 6??? 5??? 3??? 2 22??? 3??? 2??? 2??? 11? 23??? 6??? 5??? 3??? 4 24??? 12? 15? 9??? 2 ",sep="",header=TRUE,stringsAsFactors=FALSE) dat1$newID<-rep(1:(nrow(dat1)/12),each=12) #if nrow(dat1)/12 is integer ?with(dat1,aggregate(cbind(V1,V2,V3,V4),by=list(newID),mean)) #? Group.1?? V1?????? V2?????? V3?????? V4 #1?????? 1 6.75 7.333333 3.750000 2.166667 #2?????? 2 6.75 6.916667 4.333333 4.000000 #or aggregate(.~newID,data=dat1[,-1],mean) #? newID?? V1?????? V2?????? V3?????? V4 #1???? 1 6.75 7.333333 3.750000 2.166667 #2???? 2 6.75 6.916667 4.333333 4.000000 A.K. ----- Original Message ----- From: Simonas Kecorius <simolas2008 at gmail.com> To: r-help at r-project.org Cc: Sent: Saturday, January 5, 2013 8:33 AM Subject: [R] Need help on dataframe Dear R users, I came up to a problem by taking means (or other summary statistics) of a big dataframe. Suppose we do have a dataframe: ID? V1? V2? V3? V4 ........................ V71 1? ? 6? ? 5? ? 3? ? 2? ........................? 3 2? ? 3? ? 2? ? 2? ? 1? ........................? 1 3? ? 6? ? 5? ? 3? ? 2? ........................? 3 4? ? 12? 15? 3? ? 2? ........................? 100 ........................................................ ........................................................ 288 10? 20? 30? 30 .......................... 499 I need to find out the way, how to calculate a mean of every 12 lines to get: V1? ? ? ? ? ? ? ? ? ? ? ? ? ? ? V2? ? ? ? ? ? ? ? V3? ? ? ? ? ? ? ? V4 ........................... V71 mean from 1 to 7? ? ? same as V1? ? same as V1 mean from 8 to 14? ? same as V1? ? same as V1 etc. I can do it column by column using: y.ts <- ts(y$V1, frequency=12) aggregate(y.ts, FUN=mean) Bu this is a hardcore... Can anyone suggest a better way to compute all the dataframe at once and get a result as matrix? Thank you in advance! -- Simonas Kecorius ** ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi, One more way: dat1<-read.table(text=" ID? V1? V2? V3? V4 1??? 6??? 5??? 3??? 2 2??? 3??? 2??? 2??? 1 3??? 6??? 5??? 3??? 2 4??? 12? 15? 3??? 2 5??? 6??? 8??? 3??? 2 6??? 3??? 2??? 4??? 1 7??? 6??? 5??? 3??? 3 8??? 12? 15? 3??? 1 9??? 6??? 5??? 3??? 3 10??? 3??? 2??? 7??? 5 11??? 6??? 5??? 8??? 2 12??? 12? 19? 3??? 2 13??? 6??? 5??? 3??? 2 14??? 3??? 4??? 2??? 1 15??? 6??? 5??? 6??? 2 16??? 12? 15? 5??? 2 17??? 6??? 5??? 5??? 2 18??? 3??? 2??? 8??? 1 19??? 6??? 5??? 3??? 9 20??? 12? 15? 3??? 10 21??? 6??? 5??? 3??? 2 22??? 3??? 2??? 2??? 11 23??? 6??? 5??? 3??? 4 24??? 12? 15? 9??? 2 ",sep="",header=TRUE,stringsAsFactors=FALSE) res<-aggregate(.~1:nrow(dat1)%/%13,data=dat1[,-1],mean) ?names(res)[1]<-"group" ?res #? group?? V1?????? V2?????? V3?????? V4 #1???? 0 6.75 7.333333 3.750000 2.166667 #2???? 1 6.75 6.916667 4.333333 4.000000 A.K. ----- Original Message ----- From: Simonas Kecorius <simolas2008 at gmail.com> To: r-help at r-project.org Cc: Sent: Saturday, January 5, 2013 8:33 AM Subject: [R] Need help on dataframe Dear R users, I came up to a problem by taking means (or other summary statistics) of a big dataframe. Suppose we do have a dataframe: ID? V1? V2? V3? V4 ........................ V71 1? ? 6? ? 5? ? 3? ? 2? ........................? 3 2? ? 3? ? 2? ? 2? ? 1? ........................? 1 3? ? 6? ? 5? ? 3? ? 2? ........................? 3 4? ? 12? 15? 3? ? 2? ........................? 100 ........................................................ ........................................................ 288 10? 20? 30? 30 .......................... 499 I need to find out the way, how to calculate a mean of every 12 lines to get: V1? ? ? ? ? ? ? ? ? ? ? ? ? ? ? V2? ? ? ? ? ? ? ? V3? ? ? ? ? ? ? ? V4 ........................... V71 mean from 1 to 7? ? ? same as V1? ? same as V1 mean from 8 to 14? ? same as V1? ? same as V1 etc. I can do it column by column using: y.ts <- ts(y$V1, frequency=12) aggregate(y.ts, FUN=mean) Bu this is a hardcore... Can anyone suggest a better way to compute all the dataframe at once and get a result as matrix? Thank you in advance! -- Simonas Kecorius ** ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Well, a rather simple-minded, brute force approach would be to add a factor variable to the data frame and use aggregate on it. I am sure there are better ways but this will work. EXAMPLE ### xx <- data.frame(aa =1:24, b = matrix(sample(c(1,2,3,4,5,6), 72, replace = TRUE), nrow= 24)) dd <-rep(c("a","b"), each= 12) xx <- cbind(dd, xx) aggregate(xx[,3:5], list(xx$dd), mean) ################ By the way, when supplying data samples a good way is to use the dput command. Try ?dput for information John Kane Kingston ON Canada> -----Original Message----- > From: simolas2008 at gmail.com > Sent: Sat, 5 Jan 2013 15:33:03 +0200 > To: r-help at r-project.org > Subject: [R] Need help on dataframe > > Dear R users, I came up to a problem by taking means (or other summary > statistics) of a big dataframe. > > Suppose we do have a dataframe: > > ID V1 V2 V3 V4 ........................ V71 > 1 6 5 3 2 ........................ 3 > 2 3 2 2 1 ........................ 1 > 3 6 5 3 2 ........................ 3 > 4 12 15 3 2 ........................ 100 > ........................................................ > ........................................................ > 288 10 20 30 30 .......................... 499 > > I need to find out the way, how to calculate a mean of every 12 lines to > get: > > V1 V2 V3 V4 > ........................... V71 > mean from 1 to 7 same as V1 same as V1 > mean from 8 to 14 same as V1 same as V1 > etc. > > I can do it column by column using: > > y.ts <- ts(y$V1, frequency=12) > aggregate(y.ts, FUN=mean) > > Bu this is a hardcore... Can anyone suggest a better way to compute all > the > dataframe at once and get a result as matrix? > > Thank you in advance! > > -- > Simonas Kecorius > ** > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.____________________________________________________________ FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family! Visit http://www.inbox.com/photosharing to find out more!
HI, Sorry, there was a mistake, which I noticed after seeing David's post. dat1<-read.table(text=" ID? V1? V2? V3? V4 1??? 6??? 5??? 3??? 2 2??? 3??? 2??? 2??? 1 3??? 6??? 5??? 3??? 2 4??? 12? 15? 3??? 2 5??? 6??? 8??? 3??? 2 6??? 3??? 2??? 4??? 1 7??? 6??? 5??? 3??? 3 8??? 12? 15? 3??? 1 9??? 6??? 5??? 3??? 3 10??? 3??? 2??? 7??? 5 11??? 6??? 5??? 8??? 2 12??? 12? 19? 3??? 2 13??? 6??? 5??? 3??? 2 14??? 3??? 4??? 2??? 1 15??? 6??? 5??? 6??? 2 16??? 12? 15? 5??? 2 17??? 6??? 5??? 5??? 2 18??? 3??? 2??? 8??? 1 19??? 6??? 5??? 3??? 9 20??? 12? 15? 3??? 10 21??? 6??? 5??? 3??? 2 22??? 3??? 2??? 2??? 11 23??? 6??? 5??? 3??? 4 24??? 12? 15? 9??? 2 25??? 6??? 5??? 3??? 2 26??? 3??? 2??? 2??? 1 27?? 6??? 5??? 3??? 2 28??? 12? 15? 3??? 2 29??? 6??? 8??? 3??? 2 30??? 3??? 2??? 4??? 1 31??? 6??? 5??? 3??? 3 32??? 12? 15? 3??? 1 33??? 6??? 5??? 3??? 3 34??? 3??? 2??? 7??? 5 35??? 6??? 5??? 8??? 2 36??? 12? 19? 3??? 2 37??? 6??? 5??? 3??? 2 38??? 3??? 4??? 2??? 1 ",sep="",header=TRUE,stringsAsFactors=FALSE) res<-aggregate(.~(1:nrow(dat1)-1)%/%12,data=dat1[,-1],mean) ? names(res)[1]<-"group" A.K. ----- Original Message ----- From: Simonas Kecorius <simolas2008 at gmail.com> To: r-help at r-project.org Cc: Sent: Saturday, January 5, 2013 8:33 AM Subject: [R] Need help on dataframe Dear R users, I came up to a problem by taking means (or other summary statistics) of a big dataframe. Suppose we do have a dataframe: ID? V1? V2? V3? V4 ........................ V71 1? ? 6? ? 5? ? 3? ? 2? ........................? 3 2? ? 3? ? 2? ? 2? ? 1? ........................? 1 3? ? 6? ? 5? ? 3? ? 2? ........................? 3 4? ? 12? 15? 3? ? 2? ........................? 100 ........................................................ ........................................................ 288 10? 20? 30? 30 .......................... 499 I need to find out the way, how to calculate a mean of every 12 lines to get: V1? ? ? ? ? ? ? ? ? ? ? ? ? ? ? V2? ? ? ? ? ? ? ? V3? ? ? ? ? ? ? ? V4 ........................... V71 mean from 1 to 7? ? ? same as V1? ? same as V1 mean from 8 to 14? ? same as V1? ? same as V1 etc. I can do it column by column using: y.ts <- ts(y$V1, frequency=12) aggregate(y.ts, FUN=mean) Bu this is a hardcore... Can anyone suggest a better way to compute all the dataframe at once and get a result as matrix? Thank you in advance! -- Simonas Kecorius ** ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Given the data: dat1<-read.table(header=TRUE,text=" ID V1 V2 V3 V4 1 6 5 3 2 2 3 2 2 1 3 6 5 3 2 4 12 15 3 2 5 6 8 3 2 6 3 2 4 1 7 6 5 3 3 8 12 15 3 1 9 6 5 3 3 10 3 2 7 5 11 6 5 8 2 12 12 19 3 2 13 6 5 3 2 14 3 4 2 1 15 6 5 6 2 16 12 15 5 2 17 6 5 5 2 18 3 2 8 1 19 6 5 3 9 20 12 15 3 10 21 6 5 3 2 22 3 2 2 11 23 6 5 3 4 24 12 15 9 2 25 6 5 3 2 26 3 2 2 1 27 6 5 3 2 28 12 15 3 2 29 6 8 3 2 30 3 2 4 1 31 6 5 3 3 32 12 15 3 1 33 6 5 3 3 34 3 2 7 5 35 6 5 8 2 36 12 19 3 2 37 6 5 3 2 38 3 4 2 1")) The following seems to be rather quick (0.394 ms per group of up to 12 rows in 1201 rows): do.call(rbind,by(dat1,floor((dat1$ID-1)/12),colMeans)) ID V1 V2 V3 V4 0 6.0 6.272727 6.272727 3.818182 2.181818 1 17.5 6.750000 7.250000 3.833333 4.000000 2 29.5 6.750000 7.000000 4.250000 2.166667 3 37.0 7.000000 9.333333 2.666667 1.666667 If you know the number of rows is a multiple of 12: dat2<-read.table(header=TRUE,text=" ID V1 V2 V3 V4 1 6 5 3 2 2 3 2 2 1 3 6 5 3 2 4 12 15 3 2 5 6 8 3 2 6 3 2 4 1 7 6 5 3 3 8 12 15 3 1 9 6 5 3 3 10 3 2 7 5 11 6 5 8 2 12 12 19 3 2 13 6 5 3 2 14 3 4 2 1 15 6 5 6 2 16 12 15 5 2 17 6 5 5 2 18 3 2 8 1 19 6 5 3 9 20 12 15 3 10 21 6 5 3 2 22 3 2 2 11 23 6 5 3 4 24 12 15 9 2 25 6 5 3 2 26 3 2 2 1 27 6 5 3 2 28 12 15 3 2 29 6 8 3 2 30 3 2 4 1 31 6 5 3 3 32 12 15 3 1 33 6 5 3 3 34 3 2 7 5 35 6 5 8 2 36 12 19 3 2")) This is marginally better (0.378 ms per group of 12 rows in 1212 rows, or 4% less time): do.call(rbind,by(dat2,rep(1:(NROW(dat2)/12),each=12),colMeans)) ID V1 V2 V3 V4 1 6.5 6.75 7.333333 3.750000 2.166667 2 18.5 6.75 6.916667 4.333333 4.000000 3 30.5 6.75 7.333333 3.750000 2.166667 Best, Mario On Sat, Jan 5, 2013 at 8:33 AM, Simonas Kecorius <simolas2008@gmail.com>wrote:> Dear R users, I came up to a problem by taking means (or other summary > statistics) of a big dataframe. > > Suppose we do have a dataframe: > > ID V1 V2 V3 V4 ........................ V71 > 1 6 5 3 2 ........................ 3 > 2 3 2 2 1 ........................ 1 > 3 6 5 3 2 ........................ 3 > 4 12 15 3 2 ........................ 100 > ........................................................ > ........................................................ > 288 10 20 30 30 .......................... 499 > > I need to find out the way, how to calculate a mean of every 12 lines to > get: > > V1 V2 V3 V4 > ........................... V71 > mean from 1 to 7 same as V1 same as V1 > mean from 8 to 14 same as V1 same as V1 > etc. > > I can do it column by column using: > > y.ts <- ts(y$V1, frequency=12) > aggregate(y.ts, FUN=mean) > > Bu this is a hardcore... Can anyone suggest a better way to compute all the > dataframe at once and get a result as matrix? > > Thank you in advance! > > -- > Simonas Kecorius > ** > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]