thr3ads.net - R help - [R] Removing rows with earlier dates [Dec 2010]

If this information is useful, please help other people find it:
Share via:

Ali Salekfard

2010-Dec-24 13:45 UTC

[R] Removing rows with earlier dates

Hi all,

I'm new to the list but have benfited from it quite extensively. Straight to
my rather strange question:

I have a data frame that contains mapping rules in this way:

ACCOUNT, RULE COLUMNS, Effective Date


The dataframe comes from a database that stores all dates. What I would like
to do is to create a data frame with only the most recent rule for each
account. In traditional programming languages I would loop through each
account find the most recent rule(s) and fill up my updated data frame.

Does anyone have any better idea to use R's magic (Its syntax is still
magical to me) for this problem?

By the way the list of rules is quite extensive (144643 lines to be
precise), and there are usually 1-3 most recent rules (rows) for each
account.

Thanks.

	[[alternative HTML version deleted]]

David Winsemius

2010-Dec-24 16:04 UTC

head link

[R] Removing rows with earlier dates

On Dec 24, 2010, at 8:45 AM, Ali Salekfard wrote:
> Hi all,
>
> I'm new to the list but have benfited from it quite extensively.  
> Straight to
> my rather strange question:
>
> I have a data frame that contains mapping rules in this way:
>
> ACCOUNT, RULE COLUMNS, Effective Date
>
>
> The dataframe comes from a database that stores all dates. What I  
> would like
> to do is to create a data frame with only the most recent rule for  
> each
> account. In traditional programming languages I would loop through  
> each
> account find the most recent rule(s) and fill up my updated data  
> frame.
>
> Does anyone have any better idea to use R's magic (Its syntax is still
> magical to me) for this problem?
It's going to remain magic until you start thinking about what is  
needed. In this case the need is for a good understanding of the  
structure of the data object and the str function is the usual way to  
examine such AND to then communicate with the list. Read the Posting  
Guide again and the references it cites, please.
>
Here would have been my first attempt, assuming a dataframe named dfrm:
#make sure the most recent is on top
dfrm <- dfrm[ order(dfrm["Effective Date"], decreasing=TRUE), ]
# then pull the first record within ACCOUNT
tapply(dfrm, dfrm$ACCOUNT , FUN= "[", 1 , )

> By the way  the list of rules is quite extensive (144643 lines to be
> precise), and there are usually 1-3 most recent rules (rows) for each
> account.
That is a bit different than the initial problem statement in which  
you asked for the "only the most recent" within each account. How are
we supposed to get 3 _most_ recent rules? I think you are expecting us  
to read your mind regarding how you are thinking about this problem  
and pull all the records with the maximum date within an account.

Perhaps this effort to create a logical vector would be in the right  
direction:

dfrm[ ave(dfrm["Effective Date"], dfrm[ , "ACCOUNT"],
function(x) x ==
max(x), ]

It should pull all records for which the Effective Date is equal to  
the maximum within ACCOUNT. It is going to depend on whether  
"Effective Date" of of a class that can be properly compared with  
max(). Both Date and character representations of dates in standard y- 
m-d form would qualify. Other date formats might not:
 > max("01-02-2011", "02-01-2010")
[1] "02-01-2010"




-- 
David Winsemius, MD
West Hartford, CT

Joshua Wiley

2010-Dec-24 17:40 UTC

head link

[R] Removing rows with earlier dates

Hi,

On Fri, Dec 24, 2010 at 5:45 AM, Ali Salekfard <salekfard at
googlemail.com> wrote:
[snip]> I have a data frame that contains mapping rules in this way:
>
> ACCOUNT, RULE COLUMNS, Effective Date
>
>
> The dataframe comes from a database that stores all dates. What I would
like
> to do is to create a data frame with only the most recent rule for each
Assuming that "Effective Date" is a date class (you can use
str(yourdataframe) you tell):

with(YourDataFrame, tapply(`Effective Date`, `RULE COLUMNS`, which.max))

this will return the location of the maximum `Effective Date`

If you want the actual value, rather than the location:

with(YourDataFrame, tapply(`Effective Date`, `RULE COLUMNS`,
  function(x) x[which.max(x)]))

should do the trick (assuming Effective Date is a date class, not
character).  By the way, you will find your life more convenient if
you avoid object or column names with spaces or special characters.

Cheers,

Josh
> account. In traditional programming languages I would loop through each
> account find the most recent rule(s) and fill up my updated data frame.
>
> Does anyone have any better idea to use R's magic (Its syntax is still
> magical to me) for this problem?
>
> By the way the list of rules is quite extensive (144643 lines to be
> precise), and there are usually 1-3 most recent rules (rows) for each
> account.
>
> Thanks.
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

William Dunlap

2010-Dec-24 18:44 UTC

head link

[R] Removing rows with earlier dates

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Ali Salekfard
> Sent: Friday, December 24, 2010 5:46 AM
> To: r-help at r-project.org
> Subject: [R] Removing rows with earlier dates
> 
> Hi all,
> 
> I'm new to the list but have benfited from it quite 
> extensively. Straight to
> my rather strange question:
> 
> I have a data frame that contains mapping rules in this way:
> 
> ACCOUNT, RULE COLUMNS, Effective Date
> 
> 
> The dataframe comes from a database that stores all dates. 
> What I would like
> to do is to create a data frame with only the most recent 
> rule for each
> account. In traditional programming languages I would loop 
> through each
> account find the most recent rule(s) and fill up my updated 
> data frame.
Do you mean you would loop through the accounts and for
each account loop through all rules for that account looking
for the rule with the latest date?
> 
> Does anyone have any better idea to use R's magic (Its syntax is still
> magical to me) for this problem?
I like to think of this sort of a problem as a one involving
"runs" (sequences of identical data points).  The following
function identifies which data points are the last in a run:

  isLastInRun <- function (x, ...) 
  {
      retval <- c(x[-1] != x[-length(x)], TRUE)
      for (y in list(...)) {
          stopifnot(length(x) == length(y))
          retval <- retval | c(x[-1] != x[-length(x)], TRUE)
      }
      retval
  }

E.g.,
  > isLastInRun(rep(1:2,each=5), rep(1:3,c(3,3,4)))
   [1] FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE

If your data is sorted by ACCOUNT with ties broken by
`Effective Date` then you can get what I think you want with
    f0 <- function(data) {
        data[isLastInRun(data[,"ACCOUNT"]), ]
    }
If it is not sorted then sort it first and then do the above with
    f1 <- function(data) {
        data <- data[order(data[,"ACCOUNT"], data[,"Effective
Date"]), ]
        f0(data)
    }
If it is important that the output records be in the same order
as the input records then you can use
    f2 <- function(data) {
        o <- order(data[, "ACCOUNT"], data[, "Effective
Date"])
        tmp <- logical(length(o))
        tmp[o] <- isLastInRun(data[o, "ACCOUNT"])
        data[tmp,]
    }
(It may be faster to sort the output of f1 rather that
sorting the input, as f2 does, but f2's method is a bit
simpler to write.)

This approach is typically faster than tapply when there
are a lot of small groups and runs less risk of corrupting
the data when using unusual classes in the columns of your
data.frame.

Typical usage is

   > d <- data.frame(check.names=FALSE,
   +       "Effective Date"=c(1004,1008,1004,1007,1005),
   +       ACCOUNT=         c(  11,  12,  11,  12,  11),
   +       Rule=            c( "A", "B", "C",
"D", "E"))
   > f1(d)
     Effective Date ACCOUNT Rule
   5           1005      11    E
   2           1008      12    B
   > f2(d)
     Effective Date ACCOUNT Rule
   2           1008      12    B
   5           1005      11    E

Did I interpret your question correctly?

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 
> 
> By the way the list of rules is quite extensive (144643 lines to be
> precise), and there are usually 1-3 most recent rules (rows) for each
> account.
> 
> Thanks.
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Greg Snow

2010-Dec-25 04:40 UTC

head link

[R] Removing rows with earlier dates

Whenever a task calls for breaking a data object into pieces, operate on the
pieces, then put it back together, then think about using the plyr package.

Sent from my iPod

On Dec 24, 2010, at 6:58 AM, "Ali Salekfard" <salekfard at
googlemail.com> wrote:
> Hi all,
> 
> I'm new to the list but have benfited from it quite extensively.
Straight to
> my rather strange question:
> 
> I have a data frame that contains mapping rules in this way:
> 
> ACCOUNT, RULE COLUMNS, Effective Date
> 
> 
> The dataframe comes from a database that stores all dates. What I would
like
> to do is to create a data frame with only the most recent rule for each
> account. In traditional programming languages I would loop through each
> account find the most recent rule(s) and fill up my updated data frame.
> 
> Does anyone have any better idea to use R's magic (Its syntax is still
> magical to me) for this problem?
> 
> By the way the list of rules is quite extensive (144643 lines to be
> precise), and there are usually 1-3 most recent rules (rows) for each
> account.
> 
> Thanks.
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Reasonably Related Threads

Search for more reasonably related threads

R help - Dec 2010 - Removing rows with earlier dates

[R] Removing rows with earlier dates

[R] Removing rows with earlier dates

[R] Removing rows with earlier dates

[R] Removing rows with earlier dates

[R] Removing rows with earlier dates

Reasonably Related Threads