siddu479
2012-Nov-04 14:40 UTC
[R] Excluding fixed number of rows from calculation while summarizing using ddply() function.
Hello All, I have a .csv file( contents shown) below, where I need to calculate mean(for example) for only the rows highlighted in bold. (i.e. in this example case I need to exclude the first row and last row(N=1) for each "*StepNo*" column) Unique,StepNo,Data1,Data2 #In actual file I have 100 columns and nearly millions of rows. A,1,4,5 #Exclude this 1st row for this "StepNo" and "Unique" combination. *A,1,5,6 * A,1,7,8 #Exclude this last row for this "StepNo" and "Unique" combination. A,2,9,10 #Exclude this row because this 1st row for this "StepNo" and "Unique" combination. *A,2,45,25* A,2,10,11 #Exclude this last row for this "StepNo" and "Unique" combination. B,2,34,12 #Exclude this 1st row for this "StepNo" and "Unique" combination. *B,2,5,6 B,2,7,8* B,2,6,7 #Exclude this last row for this "StepNo" and "Unique" combination. B,3,1,2 #Exclude this 1st row for this "StepNo" and "Unique" combination. *B,3,3,4* B,3,4,5 #Exclude this last row for this "StepNo" and "Unique" combination. My existing code to calculate mean* for all rows* is dat <- read.csv("aboveinput.csv", header=T) #Loading Input file library("plyr") *result <- ddply(dat, .(Unique,StepNo), numcolwise(mean))* # Calculating mean for each Unique and StepNo combination and summarizing the results. *I need to modify the above script to exclude some "N number of rows at the start as well as at the end of each StepNo"* Something like result <- ddply(dat, .(Unique,StepNo),numcolwise(mean(head n rows excluded, tail n rows excluded in each StepNo))) #Just a skeleton script. Please revert to me if my question is not clear. ----- Sidda Business Analyst Lead Applied Materials Inc. -- View this message in context: http://r.789695.n4.nabble.com/Excluding-fixed-number-of-rows-from-calculation-while-summarizing-using-ddply-function-tp4648406.html Sent from the R help mailing list archive at Nabble.com.
arun
2012-Nov-04 17:24 UTC
[R] Excluding fixed number of rows from calculation while summarizing using ddply() function.
dat1<-read.table(text=" Unique,StepNo,Data1,Data2 A,1,4,5?? A,1,5,6 A,1,7,8?? A,2,9,10? A,2,45,25 A,2,10,11 B,2,34,12 B,2,5,6 B,2,7,8 B,2,6,7? B,3,1,2? B,3,3,4 B,3,4,5? ",sep=",",header=TRUE,stringsAsFactors=FALSE) dat2<-ddply(dat1,.(Unique,StepNo),function(x) x[c(1,nrow(x)),]) ? dat1$newcoldat1<-TRUE ?dat2$newcoldat2<-TRUE ?dat3<-merge(dat1,dat2,all=TRUE) dat4<-dat3[is.na(dat3$newcoldat2),1:4] dat4 #?? Unique StepNo Data1 Data2 #2?????? A????? 1???? 5???? 6 #6?????? A????? 2??? 45??? 25 #7?????? B????? 2???? 5???? 6 #9?????? B????? 2???? 7???? 8 #12????? B????? 3???? 3???? 4 ?ddply(dat4,.(Unique,StepNo),numcolwise(mean)) #? Unique StepNo Data1 Data2 #1????? A????? 1???? 5???? 6 #2????? A????? 2??? 45??? 25 #3????? B????? 2???? 6???? 7 #4????? B????? 3???? 3???? 4 A.K. ----- Original Message ----- From: siddu479 <onlyfordigitalstuff at gmail.com> To: r-help at r-project.org Cc: Sent: Sunday, November 4, 2012 9:40 AM Subject: [R] Excluding fixed number of rows from calculation while summarizing using ddply() function. Hello All, ? I have a .csv file( contents shown) below, where I need to calculate mean(for example) for only the rows highlighted in bold. (i.e. in this example case I need to exclude the first row and last row(N=1) for each "*StepNo*" column) Unique,StepNo,Data1,Data2? ? #In actual file I have 100 columns and nearly millions of rows. A,1,4,5? ? ? ? ? #Exclude this 1st row for this "StepNo" and "Unique" combination. *A,1,5,6 * A,1,7,8? ? ? ? ? #Exclude this last row for this "StepNo" and "Unique" combination. A,2,9,10? ? ? ? #Exclude this row because this 1st row for this "StepNo" and "Unique" combination. *A,2,45,25* A,2,10,11? ? ? #Exclude this last row for this "StepNo" and "Unique" combination. B,2,34,12? ? ? #Exclude this 1st row for this "StepNo" and "Unique" combination. *B,2,5,6 B,2,7,8* B,2,6,7? ? ? ? ? #Exclude this last row for this "StepNo" and "Unique" combination. B,3,1,2? ? ? ? ? #Exclude this 1st row for this "StepNo" and "Unique" combination. *B,3,3,4* B,3,4,5? ? ? ? ? #Exclude this last row for this "StepNo" and "Unique" combination. My existing code to calculate mean* for all rows* is dat <- read.csv("aboveinput.csv", header=T) #Loading Input file library("plyr")? *result <- ddply(dat, .(Unique,StepNo), numcolwise(mean))*? # Calculating mean for each Unique and StepNo combination and summarizing the results. *I need to modify the above script to exclude some "N number of rows at the start as well as at the end of each StepNo"* Something like result <- ddply(dat, .(Unique,StepNo),numcolwise(mean(head n rows excluded, tail n rows excluded in each StepNo)))? #Just a skeleton script. Please revert to me if my question is not clear. ----- Sidda Business Analyst Lead Applied Materials Inc. -- View this message in context: http://r.789695.n4.nabble.com/Excluding-fixed-number-of-rows-from-calculation-while-summarizing-using-ddply-function-tp4648406.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
arun
2012-Nov-04 17:37 UTC
[R] Excluding fixed number of rows from calculation while summarizing using ddply() function.
Hi, One more way: dat1<-read.table(text=" Unique,StepNo,Data1,Data2 A,1,4,5?? A,1,5,6 A,1,7,8?? A,2,9,10? A,2,45,25 A,2,10,11 B,2,34,12 B,2,5,6 B,2,7,8 B,2,6,7? B,3,1,2? B,3,3,4 B,3,4,5? ",sep=",",header=TRUE,stringsAsFactors=FALSE) ?dat2<-dat1[!(!duplicated(dat1[,1:2])|!duplicated(dat1[,1:2],fromLast=TRUE)),] library(plyr) ddply(dat2,.(Unique,StepNo),numcolwise(mean)) #? Unique StepNo Data1 Data2 #1????? A????? 1???? 5???? 6 #2????? A????? 2??? 45??? 25 #3????? B????? 2???? 6???? 7 #4????? B????? 3???? 3???? 4 A.K. ----- Original Message ----- From: siddu479 <onlyfordigitalstuff at gmail.com> To: r-help at r-project.org Cc: Sent: Sunday, November 4, 2012 9:40 AM Subject: [R] Excluding fixed number of rows from calculation while summarizing using ddply() function. Hello All, ? I have a .csv file( contents shown) below, where I need to calculate mean(for example) for only the rows highlighted in bold. (i.e. in this example case I need to exclude the first row and last row(N=1) for each "*StepNo*" column) Unique,StepNo,Data1,Data2? ? #In actual file I have 100 columns and nearly millions of rows. A,1,4,5? ? ? ? ? #Exclude this 1st row for this "StepNo" and "Unique" combination. *A,1,5,6 * A,1,7,8? ? ? ? ? #Exclude this last row for this "StepNo" and "Unique" combination. A,2,9,10? ? ? ? #Exclude this row because this 1st row for this "StepNo" and "Unique" combination. *A,2,45,25* A,2,10,11? ? ? #Exclude this last row for this "StepNo" and "Unique" combination. B,2,34,12? ? ? #Exclude this 1st row for this "StepNo" and "Unique" combination. *B,2,5,6 B,2,7,8* B,2,6,7? ? ? ? ? #Exclude this last row for this "StepNo" and "Unique" combination. B,3,1,2? ? ? ? ? #Exclude this 1st row for this "StepNo" and "Unique" combination. *B,3,3,4* B,3,4,5? ? ? ? ? #Exclude this last row for this "StepNo" and "Unique" combination. My existing code to calculate mean* for all rows* is dat <- read.csv("aboveinput.csv", header=T) #Loading Input file library("plyr")? *result <- ddply(dat, .(Unique,StepNo), numcolwise(mean))*? # Calculating mean for each Unique and StepNo combination and summarizing the results. *I need to modify the above script to exclude some "N number of rows at the start as well as at the end of each StepNo"* Something like result <- ddply(dat, .(Unique,StepNo),numcolwise(mean(head n rows excluded, tail n rows excluded in each StepNo)))? #Just a skeleton script. Please revert to me if my question is not clear. ----- Sidda Business Analyst Lead Applied Materials Inc. -- View this message in context: http://r.789695.n4.nabble.com/Excluding-fixed-number-of-rows-from-calculation-while-summarizing-using-ddply-function-tp4648406.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.