thr3ads.net - R help - [R] LOCF - Last Observation Carried Forward [Nov 2003]

If this information is useful, please help other people find it:
Share via:

Karl Knoblick

2003-Nov-14 14:07 UTC

[R] LOCF - Last Observation Carried Forward

Hi!
 
Is there a possibilty in R to carry out LOCF (Last Observation Carried Forward)
analysis or to create a new data frame (array, matrix) with LOCF? Or some
helpful functions, packages?
 
Karl



---------------------------------
Gesendet von http://mail.yahoo.de
Schneller als Mail - der neue Yahoo! Messenger.
	[[alternative HTML version deleted]]

Simon Fear

2003-Nov-14 16:28 UTC

head link

[R] LOCF - Last Observation Carried Forward

I use this:

#
# change NAs to preceding values (initial NAs remain NAs)
# e.g.  NA  1 NA  2 NA NA  4 NA  3
# to    NA  1  1  2  2  2  4  4  3
"locf" <- function(x) {
  assign("stored.value", x[1], envir=.GlobalEnv)
  sapply(x, function(x) {
    if(is.na(x))
      stored.value
    else {
      assign("stored.value", x, envir=.GlobalEnv)
      x
    }})
}

That sets up "LOCF" within a vector, or could be applied to the
rows of your data frame:

df[ , <<numeric bits only>>] <- apply(df[ , <<numeric bits
only>>], 2, locf)

I've got a feeling there would be a much neater way to code this
than the above (I wrote it first in Splus, hence the funny scoping control).

When  theres only one postbaseline timepoint, I tend to use this instead:

"set.nas.previous" <- function(x,previous) {
  if (is.numeric(x)) 
    x <- ifelse(is.na(x),previous,x)
  x
}

which is used like

df$locf <- set.nas.previous(df$postbaseline, df$baseline)

It's a bit of a dodgy function name really because there is no actual
test that "previous" is really previous.

HTH

Simon

PS use `subset` instead of `[`. Do what I say, not what I do.
> -----Original Message-----
> From: Karl Knoblick [mailto:karlknoblich at yahoo.de]
> Sent: 14 November 2003 14:08
> To: r-help at stat.math.ethz.ch
> Subject: [R] LOCF - Last Observation Carried Forward
> 
> 
> Security Warning: 
> If you are not sure an attachment is safe to open please contact  
> Andy on x234. There are 0 attachments with this message. 
> ________________________________________________________________ 
>  
> Hi!
>  
> Is there a possibilty in R to carry out LOCF (Last 
> Observation Carried Forward) analysis or to create a new data 
> frame (array, matrix) with LOCF? Or some helpful functions, packages?
>  
> Karl
> 
> 
> 
> ---------------------------------
> Gesendet von http://mail.yahoo.de
> Schneller als Mail - der neue Yahoo! Messenger.
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>   
Simon Fear 
Senior Statistician 
Syne qua non Ltd 
Tel: +44 (0) 1379 644449 
Fax: +44 (0) 1379 644445 
email: Simon.Fear at synequanon.com 
web: http://www.synequanon.com 
  
Number of attachments included with this message: 0 
  
This message (and any associated files) is confidential and\...{{dropped}}

Marc Schwartz

2003-Nov-14 17:00 UTC

head link

[R] LOCF - Last Observation Carried Forward

karlknoblich at yahoo.de wrote:> Hi!
>  
> Is there a possibilty in R to carry out LOCF (Last Observation Carried
> Forward) analysis or to create a new data frame (array, matrix) with
> LOCF? Or some helpful functions, packages?
>  
> Karl

As I understand the methodology and potential issues regarding the
imputation of data for the missing observations, I have a couple of
thoughts:

1. The missing observation data can be imputed where missing using
standard R data management functions. The complexity or lack of it will
likely depend upon your exact data structure. 

For example, if the missing values are all NA's, you can use
vector/matrix indexing to replace them based upon various conditions. If
the subsetting logic is more complex, you can use the replace()
function, which enables you to specify a complex boolean construct. See
?replace for more information.

If your data (x) is sequenced left to right in a time series vector, you
can identify the position of the last known observation for example:
> x <- c(23, 25, 24, NA, 25, NA, NA)
> max(which(!is.na(x)))[1] 5

and fill to the right, repeating the last known data:
> LOCF <- max(which(!is.na(x)))
> x[LOCF:length(x)] <- x[LOCF]
> x[1] 23 25 24 NA 25 25 25

A quick search on Google raises some known issues with the methodology
depending upon the nature of the missing data and what sort of
assumptions you are willing to make or live with. 

For more complex imptation, there are a variety of missing data
imputation functions available for R, for example in Frank Harrell's
Design and Hmisc packages on CRAN.

2. Another alternative to consider, depending upon how much missing data
you are dealing with and its etiology, would be an unbalanced mixed
effects approach using the model functions in package 'nlme'.  I might
defer to others here, but something to consider.

HTH,

Marc Schwartz

Tony Plate

2003-Nov-14 17:20 UTC

head link

[R] LOCF - Last Observation Carried Forward

Here's a function that does the essential computation (written to work in 
both S-plus and R).

This looks like one of those tricky problems that do not vectorize 
easily.  It would be simple to write a C-program to compute this very 
efficiently.  But are there any more efficient solutions than ones like the 
below (that are written without resort to C)?

most.recent <- function(x) {
     # return a vector of indices of the most recent TRUE value
     if (!is.logical(x))
         stop("x must be logical")
     x[is.na(x)] <- FALSE
     # x is a logical vector
     r <- rle(x)
     ends <- cumsum(r$lengths)
     starts <- ends - r$lengths + 1
     spec <- as.list(as.data.frame(rbind(start=starts, len=r$lengths, 
value=as.numeric(r$values), prev.end=c(NA, ends[-length(ends)]))))
     names(spec) <- NULL
     unlist(lapply(spec, function(s) if (s[3]) seq(s[1], len=s[2]) else 
rep(s[4], len=s[2])), use.names=F)
}

 > x <- c(F,T,T,F,F,F,T,F)
 > most.recent(x)
[1] NA  2  3  3  3  3  7  7

And using it to do the fill-forward:

 > x <- c(NA,2,3,NA,4,NA,5,NA,NA,NA,6,7,8,NA)
 > x[most.recent(!is.na(x))]
  [1] NA  2  3  3  4  4  5  5  5  5  6  7  8  8
 >

Some timings:

 > x <- sample(c(T,F),1e4,rep=T)
 > system.time(most.recent(x))
[1] 0.33 0.01 0.47   NA   NA
 > x <- sample(c(T,F),1e5,rep=T)
 > system.time(most.recent(x))
[1] 4.27 0.06 6.44   NA   NA
 > x <- sample(c(T,F),1e6,rep=T)
 > system.time(most.recent(x))
[1] 47.27  0.17 47.97    NA    NA
 >

-- Tony Plate

PS. Actually, I just found a solution that I had lying around that is about 
70 times as fast on random test data like the above.


At Friday 03:07 PM 11/14/2003 +0100, Karl Knoblick
wrote:>Hi!
>
>Is there a possibilty in R to carry out LOCF (Last Observation Carried 
>Forward) analysis or to create a new data frame (array, matrix) with LOCF? 
>Or some helpful functions, packages?
>
>Karl
>
>
>
>---------------------------------
>Gesendet von http://mail.yahoo.de
>Schneller als Mail - der neue Yahoo! Messenger.
>         [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Tony Plate   tplate at acm.org

Gabor Grothendieck

2003-Nov-15 03:21 UTC

head link

[R] LOCF - Last Observation Carried Forward

From: Tony Plate <tplate at acm.org>:>
> Here's a function that does the essential computation (written to work
in
> both S-plus and R).
> 
> This looks like one of those tricky problems that do not vectorize 
> easily. It would be simple to write a C-program to compute this very 
> efficiently. But are there any more efficient solutions than ones like the 
> below (that are written without resort to C)?
> 
> most.recent <- function(x) {
> # return a vector of indices of the most recent TRUE value
> if (!is.logical(x))
> stop("x must be logical")
> x[is.na(x)] <- FALSE
> # x is a logical vector
> r <- rle(x)
> ends <- cumsum(r$lengths)
> starts <- ends - r$lengths + 1
> spec <- as.list(as.data.frame(rbind(start=starts, len=r$lengths, 
> value=as.numeric(r$values), prev.end=c(NA, ends[-length(ends)]))))
> names(spec) <- NULL
> unlist(lapply(spec, function(s) if (s[3]) seq(s[1], len=s[2]) else 
> rep(s[4], len=s[2])), use.names=F)
> }
> 
> > x <- c(F,T,T,F,F,F,T,F)
> > most.recent(x)
> [1] NA 2 3 3 3 3 7 7
> 
> And using it to do the fill-forward:
> 
> > x <- c(NA,2,3,NA,4,NA,5,NA,NA,NA,6,7,8,NA)
> > x[most.recent(!is.na(x))]
> [1] NA 2 3 3 4 4 5 5 5 5 6 7 8 8
> >
> 
> Some timings:
> 
> > x <- sample(c(T,F),1e4,rep=T)
> > system.time(most.recent(x))
> [1] 0.33 0.01 0.47 NA NA
> > x <- sample(c(T,F),1e5,rep=T)
> > system.time(most.recent(x))
> [1] 4.27 0.06 6.44 NA NA
> > x <- sample(c(T,F),1e6,rep=T)
> > system.time(most.recent(x))
> [1] 47.27 0.17 47.97 NA NA
> >
> 
> -- Tony Plate
> 
> PS. Actually, I just found a solution that I had lying around that is about
> 70 times as fast on random test data like the above.
I was waiting for you to post this but didn't see it so I thought 
I would post mine.  This one is 13x as fast and only requires 
a single line of code.  
> set.seed(111)
> x <- sample(c(T,F),10000,rep=T)
> system.time(z1 <- most.recent(x))[1] 0.92 0.02 1.68   NA   NA
> system.time(z2 <- as.numeric(as.vector(     cut(seq(x),c(which(x),Inf),lab=which(x),right=F))))
[1] 0.07 0.00 0.12   NA   NA
> all.equal(z1,z2)[1] TRUE

Possibly Parallel Threads

Search for more reasonably related threads

R help - Nov 2003 - LOCF - Last Observation Carried Forward

[R] LOCF - Last Observation Carried Forward

[R] LOCF - Last Observation Carried Forward

[R] LOCF - Last Observation Carried Forward

[R] LOCF - Last Observation Carried Forward

[R] LOCF - Last Observation Carried Forward

Possibly Parallel Threads