Lopez, Dan
2013-Nov-11 20:50 UTC
[R] How do I derive a logical variable in a dataframe based on another row in the same dataframe?
Hi R Experts,
How do I mark rows in dataframe based on a condition that's based off
another row in the same dataframe?
I want to mark any combination of FY,ID, TT=='HC' rows that have a
FY,ID,TT=='TER' row with a 1. In my example below this is rows 4, 7 and
11.
My data looks something like this:
FY ID TT
1 FY09 1 HC
2 FY10 1 HC
3 FY11 1 HC
4 FY12 1 HC
5 FY12 1 TER
6 FY09 2 HC
7 FY10 2 HC
8 FY10 2 TER
9 FY11 2 HC
10 FY12 2 HC
11 FY13 2 HC
12 FY13 2 TER
I know for this specific example I can use:
HTDF$EXCL3<-1*duplicated(HTDF[,1:2],fromLast=T)
However my actual data set is NOT sorted by FY, ID and TT. TT is a binary factor
variable. I want to know if there is another way of doing the same thing without
sorting the data.
I tried the last line of code below but it gave me unexpected results. It marks
the first three rows with 0 and everything else with 1. Based on the warning
messages looks like it has something to do with longer object length is not a
multiple of shorter object length. But I am now stumped.
#REPRODUCIBLE EXAMPLE
FY<-factor(c("FY09","FY10","FY11","FY12","FY12","FY09","FY10","FY10","FY11","FY12","FY13","FY13"))
ID<-c(rep(1,5),rep(2,7))
TT<-factor(c(rep("HC",4),"TER","HC","HC","TER","HC","HC","HC","TER"))
HTDF<-data.frame(FY,ID,TT)
#Summarize data and get max TT. TT is a binary factor variable
library(sqldf)
HTDF.MAX<-sqldf('SELECT ID,FY,Max(TT) "MAXTT" FROM HTDF GROUP
BY ID,FY')
# Initiate new variable and assign 0 or 1
HTDF$EXCL<-0
# THIS IS WHERE I AM GETTING UNEXPECTE RESULTS
HTDF$EXCL<-ifelse(HTDF$FY==HTDF.MAX$FY&HTDF$ID==HTDF.MAX$ID&HTDF$TT==HTDF.MAX$MAXTT,0,1)
Dan Lopez
Workforce Analyst
LLNL
HRIM - Workforce Analytics & Metrics
[[alternative HTML version deleted]]
William Dunlap
2013-Nov-11 21:18 UTC
[R] How do I derive a logical variable in a dataframe based on another row in the same dataframe?
If you have an algorithm that only works on sorted data, it is easy to
write a function that sorts [a copy of] the data, applies the algorithm,
then puts the result back in the order of the original data. E.g.,
f <- function (data) {
ord <- with(data, order(TT, ID, FY)) # data[ord,] will be sorted in your
required order
data$EXCL3 <- 1 * duplicated(data[ord, 1:2], fromLast = TRUE)[order(ord)]
# [order(ord)] puts it back in original order
data
}
E.g.,> i <- c(12, 5, 10, 6, 4, 2, 1, 3, 7, 11, 9, 8)
> scrambled <- HTDF[i,]
> f(scrambled)
FY ID TT EXCL3
12 FY13 2 TER 0
5 FY12 1 TER 0
10 FY12 2 HC 0
6 FY09 2 HC 0
4 FY12 1 HC 1
2 FY10 1 HC 0
1 FY09 1 HC 0
3 FY11 1 HC 0
7 FY10 2 HC 1
11 FY13 2 HC 1
9 FY11 2 HC 0
8 FY10 2 TER 0
Or is your dataset so large that this sorting and unsorting would take too
long or too much space?
(There are faster ways of doing this than duplicated(), but the details depend
on some details like whether or not there may be more than 2 FY/ID duplicates.]
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at
r-project.org] On Behalf
> Of Lopez, Dan
> Sent: Monday, November 11, 2013 12:50 PM
> To: R help (r-help at r-project.org)
> Subject: [R] How do I derive a logical variable in a dataframe based on
another row in the
> same dataframe?
>
> Hi R Experts,
>
> How do I mark rows in dataframe based on a condition that's based off
another row in
> the same dataframe?
>
> I want to mark any combination of FY,ID, TT=='HC' rows that have a
FY,ID,TT=='TER' row
> with a 1. In my example below this is rows 4, 7 and 11.
> My data looks something like this:
> FY ID TT
> 1 FY09 1 HC
> 2 FY10 1 HC
> 3 FY11 1 HC
> 4 FY12 1 HC
> 5 FY12 1 TER
> 6 FY09 2 HC
> 7 FY10 2 HC
> 8 FY10 2 TER
> 9 FY11 2 HC
> 10 FY12 2 HC
> 11 FY13 2 HC
> 12 FY13 2 TER
>
> I know for this specific example I can use:
> HTDF$EXCL3<-1*duplicated(HTDF[,1:2],fromLast=T)
>
> However my actual data set is NOT sorted by FY, ID and TT. TT is a binary
factor variable.
> I want to know if there is another way of doing the same thing without
sorting the data.
> I tried the last line of code below but it gave me unexpected results. It
marks the first
> three rows with 0 and everything else with 1. Based on the warning
messages looks like
> it has something to do with longer object length is not a multiple of
shorter object
> length. But I am now stumped.
>
> #REPRODUCIBLE EXAMPLE
> FY<-
>
factor(c("FY09","FY10","FY11","FY12","FY12","FY09","FY10","FY10","FY11","FY12","FY13
> ","FY13"))
> ID<-c(rep(1,5),rep(2,7))
>
TT<-factor(c(rep("HC",4),"TER","HC","HC","TER","HC","HC","HC","TER"))
> HTDF<-data.frame(FY,ID,TT)
>
> #Summarize data and get max TT. TT is a binary factor variable
> library(sqldf)
> HTDF.MAX<-sqldf('SELECT ID,FY,Max(TT) "MAXTT" FROM HTDF
GROUP BY ID,FY')
>
> # Initiate new variable and assign 0 or 1
> HTDF$EXCL<-0
>
> # THIS IS WHERE I AM GETTING UNEXPECTE RESULTS
> HTDF$EXCL<-
>
ifelse(HTDF$FY==HTDF.MAX$FY&HTDF$ID==HTDF.MAX$ID&HTDF$TT==HTDF.MAX$MAX
> TT,0,1)
>
>
> Dan Lopez
> Workforce Analyst
> LLNL
> HRIM - Workforce Analytics & Metrics
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
arun
2013-Nov-11 22:25 UTC
[R] How do I derive a logical variable in a dataframe based on another row in the same dataframe?
Hi,
You may try:
fun1 <- function(dat){
dat$EXCL3 <- 0
dat$EXCL3[dat$TT=="HC"] <-
1*as.character(interaction(dat[,1:2]))[dat$TT=="HC"] %in%
as.character(interaction(dat[,1:2]))[dat$TT=="TER"]
dat
}
fun1(HTDF)
set.seed(14)
?indx <- sample(1:nrow(HTDF),12)
?HTDF1 <- HTDF[indx,]
fun1(HTDF1)
A.K.
On Monday, November 11, 2013 4:49 PM, "Lopez, Dan" <lopez235 at
llnl.gov> wrote:
Hi R Experts,
How do I mark rows in dataframe based on a condition that's based off
another row in the same dataframe?
I want to mark any combination of FY,ID, TT=='HC' rows that have a
FY,ID,TT=='TER' row with a 1.? In my example below this is rows 4, 7 and
11.
My data looks something like this:
? ? FY ID? TT
1? FY09? 1? HC
2? FY10? 1? HC
3? FY11? 1? HC
4? FY12? 1? HC
5? FY12? 1 TER
6? FY09? 2? HC
7? FY10? 2? HC
8? FY10? 2 TER
9? FY11? 2? HC
10 FY12? 2? HC
11 FY13? 2? HC
12 FY13? 2 TER
I know for this specific example I can use:
HTDF$EXCL3<-1*duplicated(HTDF[,1:2],fromLast=T)
However my actual data set is NOT sorted by FY, ID and TT. TT is a binary factor
variable. I want to know if there is another way of doing the same thing without
sorting the data.
I tried the last line of code below but it gave me unexpected results. It marks
the first three rows with 0 and everything else with 1.? Based on the warning
messages looks like it has something to do with longer object length is not a
multiple of shorter object length. But I am now stumped.
#REPRODUCIBLE EXAMPLE
FY<-factor(c("FY09","FY10","FY11","FY12","FY12","FY09","FY10","FY10","FY11","FY12","FY13","FY13"))
ID<-c(rep(1,5),rep(2,7))
TT<-factor(c(rep("HC",4),"TER","HC","HC","TER","HC","HC","HC","TER"))
HTDF<-data.frame(FY,ID,TT)
#Summarize data and get max TT. TT is a binary factor variable
library(sqldf)
HTDF.MAX<-sqldf('SELECT ID,FY,Max(TT) "MAXTT" FROM HTDF GROUP
BY ID,FY')
# Initiate new variable and assign 0 or 1
HTDF$EXCL<-0
# THIS IS WHERE I AM GETTING UNEXPECTE RESULTS
HTDF$EXCL<-ifelse(HTDF$FY==HTDF.MAX$FY&HTDF$ID==HTDF.MAX$ID&HTDF$TT==HTDF.MAX$MAXTT,0,1)
Dan Lopez
Workforce Analyst
LLNL
HRIM - Workforce Analytics & Metrics
??? [[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Gabor Grothendieck
2013-Nov-11 23:01 UTC
[R] How do I derive a logical variable in a dataframe based on another row in the same dataframe?
On Mon, Nov 11, 2013 at 3:50 PM, Lopez, Dan <lopez235 at llnl.gov> wrote:> Hi R Experts, > > How do I mark rows in dataframe based on a condition that's based off another row in the same dataframe? > > I want to mark any combination of FY,ID, TT=='HC' rows that have a FY,ID,TT=='TER' row with a 1. In my example below this is rows 4, 7 and 11. > My data looks something like this: > FY ID TT > 1 FY09 1 HC > 2 FY10 1 HC > 3 FY11 1 HC > 4 FY12 1 HC > 5 FY12 1 TER > 6 FY09 2 HC > 7 FY10 2 HC > 8 FY10 2 TER > 9 FY11 2 HC > 10 FY12 2 HC > 11 FY13 2 HC > 12 FY13 2 TER > > I know for this specific example I can use: > HTDF$EXCL3<-1*duplicated(HTDF[,1:2],fromLast=T) > > However my actual data set is NOT sorted by FY, ID and TT. TT is a binary factor variable. I want to know if there is another way of doing the same thing without sorting the data. > I tried the last line of code below but it gave me unexpected results. It marks the first three rows with 0 and everything else with 1. Based on the warning messages looks like it has something to do with longer object length is not a multiple of shorter object length. But I am now stumped. > > #REPRODUCIBLE EXAMPLE > FY<-factor(c("FY09","FY10","FY11","FY12","FY12","FY09","FY10","FY10","FY11","FY12","FY13","FY13")) > ID<-c(rep(1,5),rep(2,7)) > TT<-factor(c(rep("HC",4),"TER","HC","HC","TER","HC","HC","HC","TER")) > HTDF<-data.frame(FY,ID,TT) > > #Summarize data and get max TT. TT is a binary factor variable > library(sqldf) > HTDF.MAX<-sqldf('SELECT ID,FY,Max(TT) "MAXTT" FROM HTDF GROUP BY ID,FY') > > # Initiate new variable and assign 0 or 1 > HTDF$EXCL<-0 > > # THIS IS WHERE I AM GETTING UNEXPECTE RESULTS > HTDF$EXCL<-ifelse(HTDF$FY==HTDF.MAX$FY&HTDF$ID==HTDF.MAX$ID&HTDF$TT==HTDF.MAX$MAXTT,0,1)For each FY, ID group ave applies f to TT == 'TER' returning a logical vector that is TRUE for each HC if TER is in the group ad otherwise FALSE. Finally we add 0 to convert from TRUE/FALSE to 1/0. The rows of HTDF need not be in any specific order and their oreder will be preserved.> f <- function(x) any(x) & !x > transform(HTDF, EXCL = ave(TT == 'TER', FY, ID, FUN = f) + 0)FY ID TT EXCL 1 FY09 1 HC 0 2 FY10 1 HC 0 3 FY11 1 HC 0 4 FY12 1 HC 1 5 FY12 1 TER 0 6 FY09 2 HC 0 7 FY10 2 HC 1 8 FY10 2 TER 0 9 FY11 2 HC 0 10 FY12 2 HC 0 11 FY13 2 HC 1 12 FY13 2 TER 0 -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com