Dear R users, I have the following problems. My dataset (dat) is as follows: a <- c(1,2,3) id <- rep(a, c(3,2,3)) stat <- c(1,1,0,1,0,1,1,1) g <- c(0,0,0,0,0,0,1,0) stop <- c(1,2,4,2,4,1,1.5,3) dat <- data.frame(id,stat,g,stop) I want to creat a new dataset (dat2) with missing values such that when either g = =1 or stat = =0, the remaining rows for an individual subject is set to NA by using a new variable d (that states the exact time this happened from the stop variable). By this I mean dat2 that looks like, id <- rep(a, c(3,2,3)) sta2<- c(1,1,NA,1,NA,1,NA,NA) g2<- c(0,0,NA,0,NA,0,NA,NA) stop2 <- c(1,2,NA,2,NA,1,NA,NA) d <- c(4,4,NA,4,NA,1.5,NA,NA) dat2 <- data.frame(id=id, stat2=sta2, g2=g2,stop2=stop2,d=d). Thank you very much! John [[alternative HTML version deleted]]
David Winsemius
2012-Sep-17  19:31 UTC
[R] Creating missingness in repeated measurement data
On Sep 17, 2012, at 11:32 AM, john james wrote:> Dear R users, > > I have the following problems. My dataset (dat) is as follows: > > a <- c(1,2,3) > id <- rep(a, c(3,2,3)) > stat <- c(1,1,0,1,0,1,1,1) > g <- c(0,0,0,0,0,0,1,0) > stop <- c(1,2,4,2,4,1,1.5,3) > dat <- data.frame(id,stat,g,stop) > > I want to creat a new dataset (dat2) with missing values > such that when either g = =1 or stat = =0, the remaining rows for an > individual subject is set to NA by using a new variable d (that states > the exact time this > happened from the stop variable). By this I mean dat2 that looks like, > > id <- rep(a, c(3,2,3)) > sta2<- c(1,1,NA,1,NA,1,NA,NA) > g2<- c(0,0,NA,0,NA,0,NA,NA) > stop2 <- c(1,2,NA,2,NA,1,NA,NA) > d <- c(4,4,NA,4,NA,1.5,NA,NA) > > dat2 <- data.frame(id=id, stat2=sta2, g2=g2,stop2=stop2,d=d).> suppressidx <- ave(dat$stat==0 | dat$g==1, dat$id, FUN=cumsum) > suppress <- function(col) { ifelse( suppressidx, NA, col)} > cbind(dat[1], sapply( dat[-1], function(x) suppress(x) ) )id stat g stop 1 1 1 0 1 2 1 1 0 2 3 1 NA NA NA 4 2 1 0 2 5 2 NA NA NA 6 3 1 0 1 7 3 NA NA NA 8 3 NA NA NA -- David Winsemius, MD Alameda, CA, USA
Hello,
Maybe there are simpler ways of doing it, but try the following.
sp <- lapply(split(dat, dat$id), function(.s){
     i <- min(which(.s$stat == 0), which(.s$g == 1))
     .s$d <- .s$stop[i]
     .s[-1][row(.s[-1]) >= i] <- NA
     .s
})
dat3 <- do.call(rbind, sp)
rownames(dat3) <- seq_len(nrow(dat3))
all.equal(dat2, dat3)  # only names are different
Hope this helps,
Rui Barradas
Em 17-09-2012 19:32, john james escreveu:> Dear R users,
>   
> I have the following problems. My dataset (dat) is as follows:
>
> a <- c(1,2,3)
> id <- rep(a, c(3,2,3))
> stat <- c(1,1,0,1,0,1,1,1)
> g <- c(0,0,0,0,0,0,1,0)
> stop <- c(1,2,4,2,4,1,1.5,3)
> dat <- data.frame(id,stat,g,stop)
>   
> I want to creat a new dataset (dat2) with missing values
> such that when either g = =1 or stat = =0, the remaining rows for an
> individual subject is set to NA by using a new variable d (that states
> the exact time this
> happened from the stop variable). By this I mean dat2 that looks like,
>   
> id <- rep(a, c(3,2,3))
> sta2<- c(1,1,NA,1,NA,1,NA,NA)
> g2<- c(0,0,NA,0,NA,0,NA,NA)
> stop2 <- c(1,2,NA,2,NA,1,NA,NA)
> d <- c(4,4,NA,4,NA,1.5,NA,NA)
>   
> dat2 <- data.frame(id=id, stat2=sta2, g2=g2,stop2=stop2,d=d).
>   
> Thank you very much!
>   
> John
> 	[[alternative HTML version deleted]]
>
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
	[[alternative HTML version deleted]]
Possibly Parallel Threads
- ?to calculate sth for groups defined between points in one variable (string), / value separating/ spliting variable into groups by i.e. between start, NA, NA, stop1, start2, NA, stop2
- Why does Bootstrap work for one of similar models but not for the other?
- Transparent Bands in R
- transforming a .csv file column names as per a particular column rows using R code
- [PATCH] Added btrfs support for vfs_min_size.