Hi All, I'm having trouble selecting rows to delete, that i can't seem to overcome. Below is some sample data, i am trying to dedup the data based on each user, and simultaneously the timestamp (at the side i have highlighted expected row to be removed) I've looked at the lag function but can't seem to make it work? My logic ran along the lines of an ifelse statement and then remove after that, but it doesn't seem to work? Any help appreciated Let's call the data test test$lag <- ifelse(test$user_id==lag(test$user_id) & test$timestamp==lag(test$timestamp),1,0) Can anyone help on this? Mike Source_type timestamp user_id 75381 0 07-07-2008-21:03:55 848307909687 75379 1 07-07-2008-19:52:55 848307838407 75380 2 07-07-2008-19:54:14 848307838407 75378 1 07-07-2008-15:24:01 848285633277 75374 1 07-07-2008-13:39:17 848273633667 75377 2 07-07-2008-13:39:55 848273633667 75376 2 07-07-2008-13:39:55 848273633667 Remove 75375 2 07-07-2008-13:56:05 848273633667 75373 1 07-07-2008-17:11:00 848272661427 75371 1 07-07-2008-13:19:00 848270431847 75372 2 07-07-2008-13:19:14 848270431847 75369 1 07-07-2008-12:49:16 848269676907 Remove 75370 2 07-07-2008-12:49:16 848269676907 75366 1 07-07-2008-13:29:15 848263484847 75368 2 07-07-2008-13:29:44 848263484847 Thanks in advance [[alternative HTML version deleted]]
Michael test[!duplicated(paste(test$timestamp, test$user_id)),] should remove the second (and subsequent) occurrences of duplicates. Your example suggests you don't always want to keep the first occurrence, but the rule which determines which occurrence you want to keep is not obvious to me. HTH .... Peter Alspach> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Michael Pearmain > Sent: Wednesday, 24 September 2008 8:44 a.m. > To: r-help at r-project.org > Subject: [R] Contional > > Hi All, > > I'm having trouble selecting rows to delete, that i can't > seem to overcome. > > Below is some sample data, i am trying to dedup the data > based on each user, and simultaneously the timestamp (at the > side i have highlighted expected row to be removed) > > I've looked at the lag function but can't seem to make it work? > > My logic ran along the lines of an ifelse statement and then > remove after that, but it doesn't seem to work? Any help appreciated > > Let's call the data test > > test$lag <- ifelse(test$user_id==lag(test$user_id) > & test$timestamp==lag(test$timestamp),1,0) > > Can anyone help on this? > > Mike > > > > Source_type timestamp user_id > 75381 0 07-07-2008-21:03:55 848307909687 > 75379 1 07-07-2008-19:52:55 848307838407 > 75380 2 07-07-2008-19:54:14 848307838407 > 75378 1 07-07-2008-15:24:01 848285633277 > 75374 1 07-07-2008-13:39:17 848273633667 > 75377 2 07-07-2008-13:39:55 848273633667 > 75376 2 07-07-2008-13:39:55 848273633667 Remove > 75375 2 07-07-2008-13:56:05 848273633667 > 75373 1 07-07-2008-17:11:00 848272661427 > 75371 1 07-07-2008-13:19:00 848270431847 > 75372 2 07-07-2008-13:19:14 848270431847 > 75369 1 07-07-2008-12:49:16 848269676907 Remove > 75370 2 07-07-2008-12:49:16 848269676907 > 75366 1 07-07-2008-13:29:15 848263484847 > 75368 2 07-07-2008-13:29:44 848263484847 > > Thanks in advance > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >The contents of this e-mail are privileged and/or confidential to the named recipient and are not to be used by any other person and/or organisation. If you have received this e-mail in error, please notify the sender and delete all material pertaining to this e-mail.
Is this what you want: TRUE marks the ones to be removed> mark <- (head(x$timestamp, -1) == tail(x$timestamp, -1)) &+ (head(x$user_id, -1) == tail(x$user_id, -1))> x$flag <- c(FALSE, mark) > xSource_type timestamp user_id flag 75381 0 07-07-2008-21:03:55 848307909687 FALSE 75379 1 07-07-2008-19:52:55 848307838407 FALSE 75380 2 07-07-2008-19:54:14 848307838407 FALSE 75378 1 07-07-2008-15:24:01 848285633277 FALSE 75374 1 07-07-2008-13:39:17 848273633667 FALSE 75377 2 07-07-2008-13:39:55 848273633667 FALSE 75376 2 07-07-2008-13:39:55 848273633667 TRUE 75375 2 07-07-2008-13:56:05 848273633667 FALSE 75373 1 07-07-2008-17:11:00 848272661427 FALSE 75371 1 07-07-2008-13:19:00 848270431847 FALSE 75372 2 07-07-2008-13:19:14 848270431847 FALSE 75369 1 07-07-2008-12:49:16 848269676907 FALSE 75370 2 07-07-2008-12:49:16 848269676907 TRUE 75366 1 07-07-2008-13:29:15 848263484847 FALSE 75368 2 07-07-2008-13:29:44 848263484847 FALSE>On Tue, Sep 23, 2008 at 4:44 PM, Michael Pearmain <mpearmain at google.com> wrote:> Hi All, > > I'm having trouble selecting rows to delete, that i can't seem to overcome. > > Below is some sample data, i am trying to dedup the data based on each user, > and simultaneously the timestamp (at the side i have highlighted expected > row to be removed) > > I've looked at the lag function but can't seem to make it work? > > My logic ran along the lines of an ifelse statement and then remove after > that, but it doesn't seem to work? Any help appreciated > > Let's call the data test > > test$lag <- ifelse(test$user_id==lag(test$user_id) > & test$timestamp==lag(test$timestamp),1,0) > > Can anyone help on this? > > Mike > > > > Source_type timestamp user_id > 75381 0 07-07-2008-21:03:55 848307909687 > 75379 1 07-07-2008-19:52:55 848307838407 > 75380 2 07-07-2008-19:54:14 848307838407 > 75378 1 07-07-2008-15:24:01 848285633277 > 75374 1 07-07-2008-13:39:17 848273633667 > 75377 2 07-07-2008-13:39:55 848273633667 > 75376 2 07-07-2008-13:39:55 848273633667 Remove > 75375 2 07-07-2008-13:56:05 848273633667 > 75373 1 07-07-2008-17:11:00 848272661427 > 75371 1 07-07-2008-13:19:00 848270431847 > 75372 2 07-07-2008-13:19:14 848270431847 > 75369 1 07-07-2008-12:49:16 848269676907 Remove > 75370 2 07-07-2008-12:49:16 848269676907 > 75366 1 07-07-2008-13:29:15 848263484847 > 75368 2 07-07-2008-13:29:44 848263484847 > > Thanks in advance > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?