siddu479
2012-Nov-04  14:40 UTC
[R] Excluding fixed number of rows from calculation while summarizing using ddply() function.
Hello All,
   I have a .csv file( contents shown) below, where I need to calculate
mean(for example) for only the rows highlighted in bold. (i.e. in this
example case I need to exclude the first row and last row(N=1) for each
"*StepNo*" column)
Unique,StepNo,Data1,Data2    #In actual file I have 100 columns and nearly
millions of rows.
A,1,4,5           #Exclude this 1st row for this "StepNo" and
"Unique"
combination.
*A,1,5,6 *
A,1,7,8           #Exclude this last row for this "StepNo" and
"Unique"
combination.
A,2,9,10         #Exclude this row because this 1st row for this
"StepNo"
and "Unique" combination. 
*A,2,45,25*
A,2,10,11      #Exclude this last row for this "StepNo" and
"Unique"
combination.
B,2,34,12      #Exclude this 1st row for this "StepNo" and
"Unique"
combination. 
*B,2,5,6
B,2,7,8*
B,2,6,7           #Exclude this last row for this "StepNo" and
"Unique"
combination.
B,3,1,2           #Exclude this 1st row for this "StepNo" and
"Unique"
combination.
*B,3,3,4*
B,3,4,5          #Exclude this last row for this "StepNo" and
"Unique"
combination.
My existing code to calculate mean* for all rows* is 
dat <- read.csv("aboveinput.csv", header=T) #Loading Input file
library("plyr")   
*result <- ddply(dat, .(Unique,StepNo), numcolwise(mean))*   # Calculating
mean for each Unique and StepNo combination and summarizing the results.
*I need to modify the above script to exclude some "N number of rows at the
start as well as at the end of each StepNo"*
Something like result <- ddply(dat, .(Unique,StepNo),numcolwise(mean(head n
rows excluded, tail n rows excluded in each StepNo)))  #Just a skeleton
script.
Please revert to me if my question is not clear.
-----
Sidda
Business Analyst Lead
Applied Materials Inc.
--
View this message in context:
http://r.789695.n4.nabble.com/Excluding-fixed-number-of-rows-from-calculation-while-summarizing-using-ddply-function-tp4648406.html
Sent from the R help mailing list archive at Nabble.com.
arun
2012-Nov-04  17:24 UTC
[R] Excluding fixed number of rows from calculation while summarizing using ddply() function.
dat1<-read.table(text="
Unique,StepNo,Data1,Data2
A,1,4,5?? 
A,1,5,6
A,1,7,8?? 
A,2,9,10? 
A,2,45,25
A,2,10,11 
B,2,34,12 
B,2,5,6
B,2,7,8
B,2,6,7? 
B,3,1,2? 
B,3,3,4
B,3,4,5? 
",sep=",",header=TRUE,stringsAsFactors=FALSE)
dat2<-ddply(dat1,.(Unique,StepNo),function(x) x[c(1,nrow(x)),])
? dat1$newcoldat1<-TRUE
?dat2$newcoldat2<-TRUE
?dat3<-merge(dat1,dat2,all=TRUE)
dat4<-dat3[is.na(dat3$newcoldat2),1:4]
dat4
#?? Unique StepNo Data1 Data2
#2?????? A????? 1???? 5???? 6
#6?????? A????? 2??? 45??? 25
#7?????? B????? 2???? 5???? 6
#9?????? B????? 2???? 7???? 8
#12????? B????? 3???? 3???? 4
?ddply(dat4,.(Unique,StepNo),numcolwise(mean))
#? Unique StepNo Data1 Data2
#1????? A????? 1???? 5???? 6
#2????? A????? 2??? 45??? 25
#3????? B????? 2???? 6???? 7
#4????? B????? 3???? 3???? 4
A.K.
----- Original Message -----
From: siddu479 <onlyfordigitalstuff at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Sunday, November 4, 2012 9:40 AM
Subject: [R] Excluding fixed number of rows from calculation while summarizing
using ddply() function.
Hello All,
?  I have a .csv file( contents shown) below, where I need to calculate
mean(for example) for only the rows highlighted in bold. (i.e. in this
example case I need to exclude the first row and last row(N=1) for each
"*StepNo*" column)
Unique,StepNo,Data1,Data2? ? #In actual file I have 100 columns and nearly
millions of rows.
A,1,4,5? ? ? ? ?  #Exclude this 1st row for this "StepNo" and
"Unique"
combination.
*A,1,5,6 *
A,1,7,8? ? ? ? ?  #Exclude this last row for this "StepNo" and
"Unique"
combination.
A,2,9,10? ? ? ?  #Exclude this row because this 1st row for this
"StepNo"
and "Unique" combination. 
*A,2,45,25*
A,2,10,11? ? ? #Exclude this last row for this "StepNo" and
"Unique"
combination.
B,2,34,12? ? ? #Exclude this 1st row for this "StepNo" and
"Unique"
combination. 
*B,2,5,6
B,2,7,8*
B,2,6,7? ? ? ? ?  #Exclude this last row for this "StepNo" and
"Unique"
combination.
B,3,1,2? ? ? ? ?  #Exclude this 1st row for this "StepNo" and
"Unique"
combination.
*B,3,3,4*
B,3,4,5? ? ? ? ? #Exclude this last row for this "StepNo" and
"Unique"
combination.
My existing code to calculate mean* for all rows* is 
dat <- read.csv("aboveinput.csv", header=T) #Loading Input file
library("plyr")? 
*result <- ddply(dat, .(Unique,StepNo), numcolwise(mean))*?  # Calculating
mean for each Unique and StepNo combination and summarizing the results.
*I need to modify the above script to exclude some "N number of rows at the
start as well as at the end of each StepNo"*
Something like result <- ddply(dat, .(Unique,StepNo),numcolwise(mean(head n
rows excluded, tail n rows excluded in each StepNo)))? #Just a skeleton
script.
Please revert to me if my question is not clear.
-----
Sidda
Business Analyst Lead
Applied Materials Inc.
--
View this message in context:
http://r.789695.n4.nabble.com/Excluding-fixed-number-of-rows-from-calculation-while-summarizing-using-ddply-function-tp4648406.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
arun
2012-Nov-04  17:37 UTC
[R] Excluding fixed number of rows from calculation while summarizing using ddply() function.
Hi,
One more way:
dat1<-read.table(text="
Unique,StepNo,Data1,Data2
A,1,4,5?? 
A,1,5,6
A,1,7,8?? 
A,2,9,10? 
A,2,45,25
A,2,10,11 
B,2,34,12 
B,2,5,6
B,2,7,8
B,2,6,7? 
B,3,1,2? 
B,3,3,4
B,3,4,5? 
",sep=",",header=TRUE,stringsAsFactors=FALSE)
?dat2<-dat1[!(!duplicated(dat1[,1:2])|!duplicated(dat1[,1:2],fromLast=TRUE)),]
library(plyr)
ddply(dat2,.(Unique,StepNo),numcolwise(mean))
#? Unique StepNo Data1 Data2
#1????? A????? 1???? 5???? 6
#2????? A????? 2??? 45??? 25
#3????? B????? 2???? 6???? 7
#4????? B????? 3???? 3???? 4
A.K.
----- Original Message -----
From: siddu479 <onlyfordigitalstuff at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Sunday, November 4, 2012 9:40 AM
Subject: [R] Excluding fixed number of rows from calculation while summarizing
using ddply() function.
Hello All,
?  I have a .csv file( contents shown) below, where I need to calculate
mean(for example) for only the rows highlighted in bold. (i.e. in this
example case I need to exclude the first row and last row(N=1) for each
"*StepNo*" column)
Unique,StepNo,Data1,Data2? ? #In actual file I have 100 columns and nearly
millions of rows.
A,1,4,5? ? ? ? ?  #Exclude this 1st row for this "StepNo" and
"Unique"
combination.
*A,1,5,6 *
A,1,7,8? ? ? ? ?  #Exclude this last row for this "StepNo" and
"Unique"
combination.
A,2,9,10? ? ? ?  #Exclude this row because this 1st row for this
"StepNo"
and "Unique" combination. 
*A,2,45,25*
A,2,10,11? ? ? #Exclude this last row for this "StepNo" and
"Unique"
combination.
B,2,34,12? ? ? #Exclude this 1st row for this "StepNo" and
"Unique"
combination. 
*B,2,5,6
B,2,7,8*
B,2,6,7? ? ? ? ?  #Exclude this last row for this "StepNo" and
"Unique"
combination.
B,3,1,2? ? ? ? ?  #Exclude this 1st row for this "StepNo" and
"Unique"
combination.
*B,3,3,4*
B,3,4,5? ? ? ? ? #Exclude this last row for this "StepNo" and
"Unique"
combination.
My existing code to calculate mean* for all rows* is 
dat <- read.csv("aboveinput.csv", header=T) #Loading Input file
library("plyr")? 
*result <- ddply(dat, .(Unique,StepNo), numcolwise(mean))*?  # Calculating
mean for each Unique and StepNo combination and summarizing the results.
*I need to modify the above script to exclude some "N number of rows at the
start as well as at the end of each StepNo"*
Something like result <- ddply(dat, .(Unique,StepNo),numcolwise(mean(head n
rows excluded, tail n rows excluded in each StepNo)))? #Just a skeleton
script.
Please revert to me if my question is not clear.
-----
Sidda
Business Analyst Lead
Applied Materials Inc.
--
View this message in context:
http://r.789695.n4.nabble.com/Excluding-fixed-number-of-rows-from-calculation-while-summarizing-using-ddply-function-tp4648406.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.