vanessa van der vaart
2013-Jul-25 19:05 UTC
[R] Duplicated function with conditional statement
Hi everybody,,
I have a question about R function duplicated(). I have spent days try to
figure this out,but I cant find any solution yet. I hope somebody can help
me..
this is my data:
subj=c(1,1,1,2,2,3,3,3,4,4)
response=c('sample','sample','buy','sample','buy','sample','
sample','buy','sample','buy')
product=c(1,2,3,2,2,3,2,1,1,4)
tt=data.frame(subj, response, product)
the data look like this:
subj response product
1 1 sample 1
2 1 sample 2
3 1 buy 3
4 2 sample 2
5 2 buy 2
6 3 sample 3
7 3 sample 2
8 3 buy 1
9 4 sample 1
10 4 buy 4
I want to create new column based on the value on response and product
column. if the value on product is duplicated, then the value on new column
is 1, otherwise is 0.
but I want to add conditional statement that the value on product column
will only be considered as duplicated if the value on response column is
'buy'.
for illustration, the table should look like this:
subj response product newcolumn
1 1 sample 1 0
2 1 sample 2 0
3 1 buy 3 0
4 2 sample 2 0
5 2 buy 2 0
6 3 sample 3 1
7 3 sample 2 1
8 3 buy 1 0
9 4 sample 1 1
10 4 buy 4 0
can somebody help me?
any help will be appreciated.
I am new in this mailing list, so forgive me in advance, If I did not ask
the question appropriately.
[[alternative HTML version deleted]]
On 25.07.2013 21:05, vanessa van der vaart wrote:> Hi everybody,, > I have a question about R function duplicated(). I have spent days try to > figure this out,but I cant find any solution yet. I hope somebody can help > me.. > this is my data: > > subj=c(1,1,1,2,2,3,3,3,4,4) > response=c('sample','sample','buy','sample','buy','sample',' > sample','buy','sample','buy') > product=c(1,2,3,2,2,3,2,1,1,4) > tt=data.frame(subj, response, product) > > the data look like this: > > subj response product > 1 1 sample 1 > 2 1 sample 2 > 3 1 buy 3 > 4 2 sample 2 > 5 2 buy 2 > 6 3 sample 3 > 7 3 sample 2 > 8 3 buy 1 > 9 4 sample 1 > 10 4 buy 4 > > I want to create new column based on the value on response and product > column. if the value on product is duplicated, then the value on new column > is 1, otherwise is 0.According to your description: tt$newcolumn <- as.integer(duplicated(tt$product) & tt$response=="buy") which is different from what you show us below, where I cannot derive any systematic rule from. Uwe Ligges> but I want to add conditional statement that the value on product column > will only be considered as duplicated if the value on response column is > 'buy'. > for illustration, the table should look like this: > > subj response product newcolumn > 1 1 sample 1 0 > 2 1 sample 2 0 > 3 1 buy 3 0 > 4 2 sample 2 0 > 5 2 buy 2 0 > 6 3 sample 3 1 > 7 3 sample 2 1 > 8 3 buy 1 0 > 9 4 sample 1 1 > 10 4 buy 4 0 > > > can somebody help me? > any help will be appreciated. > I am new in this mailing list, so forgive me in advance, If I did not ask > the question appropriately. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Dear Vanessa,
Glad to know that it works.
Sorry, I misunderstood ur question initially because there were no duplicates
for "product" from response=="buy" in your initial dataset
(tt).
Regarding the code: what i did in brief is:
1. Find the rows with response=="buy
?indx<- which(dat[,colName]=="buy")? #in fun1()
dat[,newColumn]<-0 #created a newcolumn with 0's
2.? Loop over these `indx` using lapply()
3. Checked some conditions:
? a. if(i==length(indx)) #means if it is the last element in indx or the last
row with response=="buy"
??? seq(indx[i], nrow(dat)) # here I wanted to get the sequence from the last
indx to the last? row of dataframe
?? #for example.
? indx<-which(tt1$response=="buy")
?indx
# [1]? 3? 5? 8 10 11 13 14 18 19 20 22
?nrow(tt1)
#[1] 22
seq(indx[length(indx)],nrow(tt1))
#[1] 22
#this could change depending upon the two values.
seq(20,22) #if the last indx with response=="buy" was in 20th row
#[1] 20 21 22
b. the second condition occurs when you have consecutive "buy" rows
?else if((indx[i+1]-indx[i])==1){
indx
# [1]? 3? 5? 8 10 11 13 14 18 19 20 22
indx[5]-indx[4] # or
indx[7]-indx[6] #or
indx[9]-indx[8] etc..
then I would want that indx[i] value in the loop
c. if it is other cases:
indx[1], indx[2]
seq(indx[1]+1, indx[1+1]-1)
#[1] 4
4. x2<- dat[unique(c(indx[i:1],x1)),] ### this was a bug in the function
which troubled me.
it should be
x2<- dat[unique(c(indx[1:i],x1)),] #this is what I was looking for.? It
created a problem which I fixed using
x4New<- #?
x2 ## gives me all the rows starting from the 1st row of
response=="buy" to that row of response=="buy" according to
the indx + the rows that are between two indx values
For indx[1], it should be row 4 because indx[2] is 5.
likewise for indx[2], it is
seq(indx[2]+1, indx[2+1]-1)
#[1] 6 7
5. Subset the data `x2` into x3 and x4 which have response=="sample"
and response=="buy" respectively
6. x4New <- # because of a previous mistake by me.? It is still needed as an
additional check
7. x5<- # it checks the duplicated rows for product in x4New
8. x6<- #here, a condition was used because some list elements have 0 rows
for x3.? I guess it occurs when you have consecutive "buy" rows.
9. sort(as.numeric(c(x5,x6))) #concatentate and sorted these
10. unique(unlist(.... #unlist the list and choose only the unique elements
11. dat[unique(unlist(....,newColumn]<-1 # assign those rows that fits the
condition in newColumn as 1.
Hope it helps.
Regards,
A.K.
?
________________________________
From: vanessa van der vaart <vanessa.vaart at gmail.com>
To: arun <smartpink111 at yahoo.com>
Sent: Saturday, July 27, 2013 11:07 PM
Subject: Re: [R] Duplicated function with conditional statement
Dear Arun,,
Thank you very much. the code really works.
I was wondering if you could explain how the code works.
I am really interested in R, and I really want to master it?
I will really appreciate it, but please, if you think this is too much to ask,
please just ignore it.
Thank you very much in advance,
Best Regards,Vanessa
On Sun, Jul 28, 2013 at 4:02 AM, vanessa van der vaart <vanessa.vaart at
gmail.com> wrote:
Dear Arun,,>
>
>Thank you. its perfect! wow! thank you very much..and David, thank you for
you too.. its such a help. I am so sorry it must've been confusing at the
beginning..
>really, I dont know how to thank you.. ?
>
>
>well do you mind if I ask you how can you be so expert? what kind a book or
training did you have? and how long have you been working on R?
>I am really interested in R
>
>
>
>On Sun, Jul 28, 2013 at 2:40 AM, arun <smartpink111 at yahoo.com>
wrote:
>
>If you wanted to wrap it in a function:
>>
>>
>>
>>fun1<- function(dat,colName,newColumn){
>>????? indx<- which(dat[,colName]=="buy")
>>????? dat[,newColumn]<-0
>>????? dat[unique(unlist(lapply(seq_along(indx),function(i){
>>
>>??? ??? ??? x1<- if(i==length(indx)){
>>??? ??? ??? ??? seq(indx[i],nrow(dat))
>>??? ??? ??? ?}
>>??? ??? ??? else if((indx[i+1]-indx[i])==1){
>>??? ??? ??? indx[i]
>>??? ??? ??? }
>>??? ??? ??? else {
>>??? ??? ??? seq(indx[i]+1,indx[i+1]-1)
>>??? ??? ??? ?}
>>??? ??? ??? x2<- dat[unique(c(indx[i:1],x1)),]
>>??? ??? ??? x3<- subset(x2,response=="sample")
>>??? ??? ??? x4<- subset(x2,response=="buy")
>>??? ??? ??? x4New<-x4[order(as.numeric(row.names(x4))),]
>>??? ??? ??? x5<- row.names(x4New)[duplicated(x4New$product)]
>>??? ??? ??? x6<- if(nrow(x3)!=0) {
>>??? ??? ??? ??????????????? row.names(x3)[x3$product%in% x4$product]
>>??? ??? ??? ??? ??? ?? }
>>??? ??? ???
>>??? ??? ??? sort(as.numeric(c(x5,x6)))
>>??? ??? ??? }))),newColumn] <- 1
>>??? dat???
>>
>>
>>??? }
>>
>>
>>?fun1(tt1,"response","newCol")
>>#?? subj response product newCol
>>#1???? 1?? sample?????? 1????? 0
>>#2???? 1?? sample?????? 2????? 0
>>#3???? 1????? buy?????? 3????? 0
>>#4???? 2?? sample?????? 2????? 0
>>#5???? 2????? buy?????? 2????? 0
>>#6???? 3?? sample?????? 3????? 1
>>#7???? 3?? sample?????? 2????? 1
>>#8???? 3????? buy?????? 1????? 0
>>#9???? 4?? sample?????? 1????? 1
>>#10??? 4????? buy?????? 4????? 0
>>#11??? 5????? buy?????? 4????? 1
>>#12??? 5?? sample?????? 2????? 1
>>#13??? 5????? buy?????? 2????? 1
>>#14??? 6????? buy?????? 4????? 1
>>#15??? 6?? sample?????? 5????? 0
>>#16??? 6?? sample?????? 5????? 0
>>#17??? 7?? sample?????? 4????? 1
>>#18??? 7????? buy?????? 3????? 1
>>#19??? 7????? buy?????? 4????? 1
>>#20??? 8????? buy?????? 5????? 0
>>#21??? 8?? sample?????? 4????? 1
>>#22??? 8????? buy?????? 2????? 1
>>
>>A.K.
>>
>>
>>
>>----- Original Message -----
>>From: arun <smartpink111 at yahoo.com>
>>To: vanessa van der vaart <vanessa.vaart at gmail.com>
>>Cc: David Winsemius <dwinsemius at comcast.net>; R help <r-help
at r-project.org>
>>
>>Sent: Saturday, July 27, 2013 9:11 PM
>>Subject: Re: [R] Duplicated function with conditional statement
>>
>>HI,
>>May be this is what you wanted.
>>#using tt1
>>indx<-which(tt1$response=="buy")
>>tt1$newcolumn<-0
>>tt1[unique(unlist(lapply(seq_along(indx),function(i){x1<-if(i==length(indx))
seq(indx[i],nrow(tt1)) else if((indx[i+1]-indx[i])==1) indx[i] else
seq(indx[i]+1,indx[i+1]-1);x2<-
tt1[unique(c(indx[1:i],x1)),];x3<-subset(x2,response=="sample");x4<-
subset(x2,response=="buy");
x5<-row.names(x4)[duplicated(x4$product)];x6<-if(nrow(x3)!=0)
row.names(x3)[x3$product%in%
x4$product];sort(c(x5,x6))}))),"newcolumn"]<-1
>>
>>
>>?tt1
>>?? subj response product newcolumn
>>1???? 1?? sample?????? 1???????? 0
>>2???? 1?? sample?????? 2???????? 0
>>3???? 1????? buy?????? 3???????? 0
>>4???? 2?? sample?????? 2???????? 0
>>5???? 2????? buy?????? 2???????? 0
>>6???? 3?? sample?????? 3???????? 1
>>7???? 3?? sample?????? 2???????? 1
>>8???? 3????? buy?????? 1???????? 0
>>9???? 4?? sample?????? 1???????? 1
>>10??? 4????? buy?????? 4???????? 0
>>11??? 5????? buy?????? 4???????? 1
>>12??? 5?? sample?????? 2???????? 1
>>13??? 5????? buy?????? 2???????? 1
>>14??? 6????? buy?????? 4???????? 1
>>15??? 6?? sample?????? 5???????? 0
>>16??? 6?? sample?????? 5???????? 0
>>17??? 7?? sample?????? 4???????? 1
>>18??? 7????? buy?????? 3???????? 1
>>19??? 7????? buy?????? 4???????? 1
>>20??? 8????? buy?????? 5???????? 0
>>21??? 8?? sample?????? 4???????? 1
>>22??? 8????? buy?????? 2???????? 1
>>A.K.
>>
>>
>>
>>
>>
>>________________________________
>>From: vanessa van der vaart <vanessa.vaart at gmail.com>
>>To: arun <smartpink111 at yahoo.com>
>>Cc: David Winsemius <dwinsemius at comcast.net>; R help <r-help
at r-project.org>
>>Sent: Saturday, July 27, 2013 6:55 PM
>>Subject: Re: [R] Duplicated function with conditional statement
>>
>>
>>
>>Dear all,,
>>thank you all for your help..Its been such a help but its not really
exactly what I am looking for. Apparently I havent explained the condition very
clearly. I hope this can works.
>>
>>If the data on column product is duplicated from the previous row, (its
applied for response==buy and ==sample) , and it is duplicated from the row
which has the value on column 'response'== buy, than ?the value = 1,
otherwise is =0.
>>so in that case,
>>if the value is duplicated but it is duplicated from the previous row
where the value of resonse==sample, than it is not considered duplicated, and in
the new column is 0
>>
>>thank you very much in advance,
>>I really appreciated
>>
>