I have a dataset of the form below, consisting of one unique ID per row, followed by a series of visit dates. At each visit there are values for 3 dichotomous variables. Of the 8 different possible combinations of the three variables, 4 are "abnormal" and the remaining 4 are "normal". Everyone starts out abnormal, and then either continues to be abnormal at subsequent visits, or resolves to a normal pattern at a later visit (I ignore reversion back to abnormal - once they are normal, they are normal) I have to end up with 4 new columns indicating 1) date of last completed visit (regardless of intervening "NAs", 2) whether an ID resolved or stayed abnormal, 3) if resolved, what the resolution pattern was and 4) what the date of resolution was. NAs always come in groups of 4 (ie no visit date, and no value for the 3 variables) and are ignored. Eventually I have to determine mean time to resolution, mean follow-up time, etc and I think I can do that, but the first part is a bit beyond my coding skill. Suggestions appreciated. tC <- textConnection(" ID V1Date V1a V1b V1c V2date V2a V2b V2c V3date V3a V3b V3c 001 4/5/12 Yes Yes No 6/18/12 Yes No Yes NA NA NA NA 002 1/22/12 No No Yes 7/5/12 Yes No Yes NA NA NA NA 003 4/5/12 Yes No No 9/4/12 Yes No Yes 11/1/12 Yes No Yes 004 8/18/12 Yes Yes Yes 9/22/12 Yes No Yes NA NA NA NA 005 9/6/12 Yes No No NA NA NA NA 12/4/12 Yes No Yes ") data1 <- read.table(header=TRUE, tC) close.connection(tC) rm(tC)
I had some difficulty getting the data read in using the code you included in your email, although I'm not sure why. I'm pasting in the code that worked for me, below. I think that the calculations that you want to make would be easier if you rearranged your data first. I used your example data to do just that. Once the data are rearranged, it is very easy to look at information on the last visit from each ID (see code, below). This includes much of the information you describe in your query, 1) date of last completed visit 2) whether an ID resolved, and 3) what the final pattern was. Jean tC <- textConnection("ID V1Date V1a V1b V1c V2date V2a V2b V2c V3date V3a V3b V3c 001 4/5/12 Yes Yes No 6/18/12 Yes No Yes NA NA NA NA 002 1/22/12 No No Yes 7/5/12 Yes No Yes NA NA NA NA 003 4/5/12 Yes No No 9/4/12 Yes No Yes 11/1/12 Yes No Yes 004 8/18/12 Yes Yes Yes 9/22/12 Yes No Yes NA NA NA NA 005 9/6/12 Yes No No NA NA NA NA 12/4/12 Yes No Yes") data1 <- read.table(header=TRUE, tC) close.connection(tC) rm(tC) # rearrange the data data2 <- data.frame( id = rep(data1$ID, 3), visit = rep(1:3, rep(dim(data1)[1], 3)), date = as.Date(c(data1$V1Date, data1$V2date, data1$V3date), "%m/%d/%y"), dva = c(data1$V1a, data1$V2a, data1$V3a), dvb = c(data1$V1a, data1$V2a, data1$V3a), dvc = c(data1$V1a, data1$V2a, data1$V3a)) # define a new variable that is a combination of the three dichotomous variables data2$abc <- paste0(substring(data2$dva, 1, 1), substring(data2$dvb, 1, 1), substring(data2$dvb, 1, 1)) # define a new variable that indicates whether the combination is "normal" data2$normal <- data2$abc %in% c("YYN", "YNY", "YYN", "NNY") # eliminate rows without visit information data3 <- data2[!is.na(data2$date), ] # split the data into lists according to id list4 <- split(data3, data3$id) # show the last visit from each id do.call(rbind, lapply(list4, function(df) df[dim(df)[1], ])) On Fri, Dec 14, 2012 at 10:37 AM, marcel curlin <marcelcurlin@gmail.com>wrote:> I have a dataset of the form below, consisting of one unique ID per > row, followed by a series of visit dates. At each visit there are > values for 3 dichotomous variables. Of the 8 different possible > combinations of the three variables, 4 are "abnormal" and the > remaining 4 are "normal". Everyone starts out abnormal, and then > either continues to be abnormal at subsequent visits, or resolves to a > normal pattern at a later visit (I ignore reversion back to abnormal - > once they are normal, they are normal) > > I have to end up with 4 new columns indicating 1) date of last > completed visit (regardless of intervening "NAs", 2) whether an ID > resolved or stayed abnormal, 3) if resolved, what the resolution > pattern was and 4) what the date of resolution was. NAs always come in > groups of 4 (ie no visit date, and no value for the 3 variables) and > are ignored. > > Eventually I have to determine mean time to resolution, mean follow-up > time, etc and I think I can do that, but the first part is a bit > beyond my coding skill. Suggestions appreciated. > > tC <- textConnection(" > ID V1Date V1a V1b V1c V2date V2a V2b V2c V3date V3a V3b V3c > 001 4/5/12 Yes Yes No 6/18/12 Yes No Yes NA NA NA NA > 002 1/22/12 No No Yes 7/5/12 Yes No Yes NA NA NA NA > 003 4/5/12 Yes No No 9/4/12 Yes No Yes 11/1/12 Yes No Yes > 004 8/18/12 Yes Yes Yes 9/22/12 Yes No Yes NA NA NA NA > 005 9/6/12 Yes No No NA NA NA NA 12/4/12 Yes No Yes > ") > data1 <- read.table(header=TRUE, tC) > close.connection(tC) > rm(tC) > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hi Jean, Just to clarify whether it is a 'typo' or not. data2 <- data.frame( id = rep(data1$ID, 3), visit = rep(1:3, rep(dim(data1)[1], 3)), date = as.Date(c(data1$V1Date, data1$V2date, data1$V3date), "%m/%d/%y"), dva = c(data1$V1a, data1$V2a, data1$V3a), dvb = c(data1$V1a, data1$V2a, data1$V3a),#? 'b' dvc = c(data1$V1a, data1$V2a, data1$V3a)) # 'c' A.K. ----- Original Message ----- From: "Adams, Jean" <jvadams at usgs.gov> To: marcel curlin <marcelcurlin at gmail.com> Cc: r-help at r-project.org Sent: Monday, December 17, 2012 5:29 PM Subject: Re: [R] Manipulation of longitudinal data by row I had some difficulty getting the data read in using the code you included in your email, although I'm not sure why.? I'm pasting in the code that worked for me, below. I think that the calculations that you want to make would be easier if you rearranged your data first.? I used your example data to do just that. Once the data are rearranged, it is very easy to look at information on the last visit from each ID (see code, below).? This includes much of the information you describe in your query, 1) date of last completed visit 2) whether an ID resolved, and 3) what the final pattern was. Jean tC <- textConnection("ID V1Date V1a V1b V1c V2date V2a V2b V2c V3date V3a V3b V3c 001 4/5/12 Yes Yes No 6/18/12 Yes No Yes NA NA NA NA 002 1/22/12 No No Yes 7/5/12 Yes No Yes NA NA NA NA 003 4/5/12 Yes No No 9/4/12 Yes No Yes 11/1/12 Yes No Yes 004 8/18/12 Yes Yes Yes 9/22/12 Yes No Yes NA NA NA NA 005 9/6/12 Yes No No NA NA NA NA 12/4/12 Yes No Yes") data1 <- read.table(header=TRUE, tC) close.connection(tC) rm(tC) # rearrange the data data2 <- data.frame( id = rep(data1$ID, 3), visit = rep(1:3, rep(dim(data1)[1], 3)), date = as.Date(c(data1$V1Date, data1$V2date, data1$V3date), "%m/%d/%y"), dva = c(data1$V1a, data1$V2a, data1$V3a), dvb = c(data1$V1a, data1$V2a, data1$V3a), dvc = c(data1$V1a, data1$V2a, data1$V3a)) # define a new variable that is a combination of the three dichotomous variables data2$abc <- paste0(substring(data2$dva, 1, 1), substring(data2$dvb, 1, 1), substring(data2$dvb, 1, 1)) # define a new variable that indicates whether the combination is "normal" data2$normal <- data2$abc %in% c("YYN", "YNY", "YYN", "NNY") # eliminate rows without visit information data3 <- data2[!is.na(data2$date), ] # split the data into lists according to id list4 <- split(data3, data3$id) # show the last visit from each id do.call(rbind, lapply(list4, function(df) df[dim(df)[1], ])) On Fri, Dec 14, 2012 at 10:37 AM, marcel curlin <marcelcurlin at gmail.com>wrote:> I have a dataset of the form below, consisting of one unique ID per > row, followed by a series of visit dates.? At each visit there are > values for 3 dichotomous variables. Of the 8 different possible > combinations of the three variables, 4? are "abnormal" and the > remaining 4 are "normal". Everyone starts out abnormal, and then > either continues to be abnormal at subsequent visits, or resolves to a > normal pattern at a later visit (I ignore reversion back to abnormal - > once they are normal, they are normal) > > I have to end up with 4 new columns indicating 1) date of last > completed visit (regardless of intervening "NAs", 2) whether an ID > resolved or stayed abnormal, 3) if resolved, what the resolution > pattern was and 4) what the date of resolution was. NAs always come in > groups of 4 (ie no visit date, and no value for the 3 variables) and > are ignored. > > Eventually I have to determine mean time to resolution, mean follow-up > time, etc and I think I can do that, but the first part is a bit > beyond my coding skill. Suggestions appreciated. > > tC <- textConnection(" > ID V1Date V1a V1b V1c V2date V2a V2b V2c V3date V3a V3b V3c > 001 4/5/12 Yes Yes No 6/18/12 Yes No Yes NA NA NA NA > 002 1/22/12 No No Yes 7/5/12 Yes No Yes NA NA NA NA > 003 4/5/12 Yes No No 9/4/12 Yes No Yes 11/1/12 Yes No Yes > 004 8/18/12 Yes Yes Yes 9/22/12 Yes No Yes NA NA NA NA > 005 9/6/12 Yes No No NA NA NA NA 12/4/12 Yes No Yes > ") > data1 <- read.table(header=TRUE, tC) > close.connection(tC) > rm(tC) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.