Hi! Is there a possibilty in R to carry out LOCF (Last Observation Carried Forward) analysis or to create a new data frame (array, matrix) with LOCF? Or some helpful functions, packages? Karl --------------------------------- Gesendet von http://mail.yahoo.de Schneller als Mail - der neue Yahoo! Messenger. [[alternative HTML version deleted]]
I use this: # # change NAs to preceding values (initial NAs remain NAs) # e.g. NA 1 NA 2 NA NA 4 NA 3 # to NA 1 1 2 2 2 4 4 3 "locf" <- function(x) { assign("stored.value", x[1], envir=.GlobalEnv) sapply(x, function(x) { if(is.na(x)) stored.value else { assign("stored.value", x, envir=.GlobalEnv) x }}) } That sets up "LOCF" within a vector, or could be applied to the rows of your data frame: df[ , <<numeric bits only>>] <- apply(df[ , <<numeric bits only>>], 2, locf) I've got a feeling there would be a much neater way to code this than the above (I wrote it first in Splus, hence the funny scoping control). When theres only one postbaseline timepoint, I tend to use this instead: "set.nas.previous" <- function(x,previous) { if (is.numeric(x)) x <- ifelse(is.na(x),previous,x) x } which is used like df$locf <- set.nas.previous(df$postbaseline, df$baseline) It's a bit of a dodgy function name really because there is no actual test that "previous" is really previous. HTH Simon PS use `subset` instead of `[`. Do what I say, not what I do.> -----Original Message----- > From: Karl Knoblick [mailto:karlknoblich at yahoo.de] > Sent: 14 November 2003 14:08 > To: r-help at stat.math.ethz.ch > Subject: [R] LOCF - Last Observation Carried Forward > > > Security Warning: > If you are not sure an attachment is safe to open please contact > Andy on x234. There are 0 attachments with this message. > ________________________________________________________________ > > Hi! > > Is there a possibilty in R to carry out LOCF (Last > Observation Carried Forward) analysis or to create a new data > frame (array, matrix) with LOCF? Or some helpful functions, packages? > > Karl > > > > --------------------------------- > Gesendet von http://mail.yahoo.de > Schneller als Mail - der neue Yahoo! Messenger. > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >Simon Fear Senior Statistician Syne qua non Ltd Tel: +44 (0) 1379 644449 Fax: +44 (0) 1379 644445 email: Simon.Fear at synequanon.com web: http://www.synequanon.com Number of attachments included with this message: 0 This message (and any associated files) is confidential and\...{{dropped}}
karlknoblich at yahoo.de wrote:> Hi! > > Is there a possibilty in R to carry out LOCF (Last Observation Carried > Forward) analysis or to create a new data frame (array, matrix) with > LOCF? Or some helpful functions, packages? > > KarlAs I understand the methodology and potential issues regarding the imputation of data for the missing observations, I have a couple of thoughts: 1. The missing observation data can be imputed where missing using standard R data management functions. The complexity or lack of it will likely depend upon your exact data structure. For example, if the missing values are all NA's, you can use vector/matrix indexing to replace them based upon various conditions. If the subsetting logic is more complex, you can use the replace() function, which enables you to specify a complex boolean construct. See ?replace for more information. If your data (x) is sequenced left to right in a time series vector, you can identify the position of the last known observation for example:> x <- c(23, 25, 24, NA, 25, NA, NA) > max(which(!is.na(x)))[1] 5 and fill to the right, repeating the last known data:> LOCF <- max(which(!is.na(x))) > x[LOCF:length(x)] <- x[LOCF] > x[1] 23 25 24 NA 25 25 25 A quick search on Google raises some known issues with the methodology depending upon the nature of the missing data and what sort of assumptions you are willing to make or live with. For more complex imptation, there are a variety of missing data imputation functions available for R, for example in Frank Harrell's Design and Hmisc packages on CRAN. 2. Another alternative to consider, depending upon how much missing data you are dealing with and its etiology, would be an unbalanced mixed effects approach using the model functions in package 'nlme'. I might defer to others here, but something to consider. HTH, Marc Schwartz
Here's a function that does the essential computation (written to work in both S-plus and R). This looks like one of those tricky problems that do not vectorize easily. It would be simple to write a C-program to compute this very efficiently. But are there any more efficient solutions than ones like the below (that are written without resort to C)? most.recent <- function(x) { # return a vector of indices of the most recent TRUE value if (!is.logical(x)) stop("x must be logical") x[is.na(x)] <- FALSE # x is a logical vector r <- rle(x) ends <- cumsum(r$lengths) starts <- ends - r$lengths + 1 spec <- as.list(as.data.frame(rbind(start=starts, len=r$lengths, value=as.numeric(r$values), prev.end=c(NA, ends[-length(ends)])))) names(spec) <- NULL unlist(lapply(spec, function(s) if (s[3]) seq(s[1], len=s[2]) else rep(s[4], len=s[2])), use.names=F) } > x <- c(F,T,T,F,F,F,T,F) > most.recent(x) [1] NA 2 3 3 3 3 7 7 And using it to do the fill-forward: > x <- c(NA,2,3,NA,4,NA,5,NA,NA,NA,6,7,8,NA) > x[most.recent(!is.na(x))] [1] NA 2 3 3 4 4 5 5 5 5 6 7 8 8 > Some timings: > x <- sample(c(T,F),1e4,rep=T) > system.time(most.recent(x)) [1] 0.33 0.01 0.47 NA NA > x <- sample(c(T,F),1e5,rep=T) > system.time(most.recent(x)) [1] 4.27 0.06 6.44 NA NA > x <- sample(c(T,F),1e6,rep=T) > system.time(most.recent(x)) [1] 47.27 0.17 47.97 NA NA > -- Tony Plate PS. Actually, I just found a solution that I had lying around that is about 70 times as fast on random test data like the above. At Friday 03:07 PM 11/14/2003 +0100, Karl Knoblick wrote:>Hi! > >Is there a possibilty in R to carry out LOCF (Last Observation Carried >Forward) analysis or to create a new data frame (array, matrix) with LOCF? >Or some helpful functions, packages? > >Karl > > > >--------------------------------- >Gesendet von http://mail.yahoo.de >Schneller als Mail - der neue Yahoo! Messenger. > [[alternative HTML version deleted]] > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://www.stat.math.ethz.ch/mailman/listinfo/r-helpTony Plate tplate at acm.org
From: Tony Plate <tplate at acm.org>:> > Here's a function that does the essential computation (written to work in > both S-plus and R). > > This looks like one of those tricky problems that do not vectorize > easily. It would be simple to write a C-program to compute this very > efficiently. But are there any more efficient solutions than ones like the > below (that are written without resort to C)? > > most.recent <- function(x) { > # return a vector of indices of the most recent TRUE value > if (!is.logical(x)) > stop("x must be logical") > x[is.na(x)] <- FALSE > # x is a logical vector > r <- rle(x) > ends <- cumsum(r$lengths) > starts <- ends - r$lengths + 1 > spec <- as.list(as.data.frame(rbind(start=starts, len=r$lengths, > value=as.numeric(r$values), prev.end=c(NA, ends[-length(ends)])))) > names(spec) <- NULL > unlist(lapply(spec, function(s) if (s[3]) seq(s[1], len=s[2]) else > rep(s[4], len=s[2])), use.names=F) > } > > > x <- c(F,T,T,F,F,F,T,F) > > most.recent(x) > [1] NA 2 3 3 3 3 7 7 > > And using it to do the fill-forward: > > > x <- c(NA,2,3,NA,4,NA,5,NA,NA,NA,6,7,8,NA) > > x[most.recent(!is.na(x))] > [1] NA 2 3 3 4 4 5 5 5 5 6 7 8 8 > > > > Some timings: > > > x <- sample(c(T,F),1e4,rep=T) > > system.time(most.recent(x)) > [1] 0.33 0.01 0.47 NA NA > > x <- sample(c(T,F),1e5,rep=T) > > system.time(most.recent(x)) > [1] 4.27 0.06 6.44 NA NA > > x <- sample(c(T,F),1e6,rep=T) > > system.time(most.recent(x)) > [1] 47.27 0.17 47.97 NA NA > > > > -- Tony Plate > > PS. Actually, I just found a solution that I had lying around that is about > 70 times as fast on random test data like the above.I was waiting for you to post this but didn't see it so I thought I would post mine. This one is 13x as fast and only requires a single line of code.> set.seed(111) > x <- sample(c(T,F),10000,rep=T)> system.time(z1 <- most.recent(x))[1] 0.92 0.02 1.68 NA NA> system.time(z2 <- as.numeric(as.vector(cut(seq(x),c(which(x),Inf),lab=which(x),right=F)))) [1] 0.07 0.00 0.12 NA NA> all.equal(z1,z2)[1] TRUE
Possibly Parallel Threads
- last observation carried forward +1
- help with using last observation carried forward analysis for a clinical trial please
- Interrater and intrarater variability (intraclass correlationcoefficients)
- Read SPSS data (*.sav) in R 1.8.0 (ok) and R1.9.1(error)
- Plot grouped data: How to change x-axis? (nlme)