Hi, Could someone help with coding this in R? I need to select one row per patient i in clinic j. The data is organized similar to that shown below. Two columns - patient i in column j identify each unique patient. There are two columns on outcome. Some patients have multiple rows with each row representing one visit, coded for in the column, visit. Some patients have just one row indicating data from a single visit. I need to select one row per patient i in clinic j using the following algorithm: If patient has outcome recorded at visit 2, then outcome = outcome columns at visit 2 If patient does not have visit 2, then outcome = outcome at visit 5 If patient does not have visit 2 and visit 5, then outcome = outcome at visit 4 If patient does not have visits 2, 5, and 4, then outcome = outcome at visit 3 If patient does not have visits 2, 5, 4, and 3, then outcome = outcome at visit 1 If patient does not have any of the visits, outcome = missing Patient Clinic Visit Outcome_left Outcome_right patient 1 clinic 1 visit 2 22 21 patient 1 clinic 3 visit 1 21 21 patient 1 clinic 3 visit 2 21 22 patient 1 clinic 3 visit 3 20 22 patient 3 clinic 5 visit 1 24 21 patient 3 clinic 5 visit 3 21 22 patient 3 clinic 5 visit 4 22 23 patient 3 clinic 5 visit 5 22 22 I need to select just the first row for patient 1/clinic 1; the second row (visit 2) for patient 1/clinic 3; and the fourth row (visit 5) for patient 3/clinic 5. How does one code for that? Thank you, SV
Vedula, Satyanarayana <svedula <at> jhsph.edu> writes:> > > I need to select one row per patient i in clinic j. The data is organized > similar to that shown below. >...> If patient has outcome recorded at visit 2, then outcome = outcome >columns at visit 2 > If patient does not have visit 2, then outcome = outcome at visit 5 > If patient does not have visit 2 and visit 5, then outcome = outcome at > visit ... other rulesI prefer to use a table driven approach here, because one can easily get lost in all these if's, and medical research requires well defined documentation of the outcome you choose. So I first convert the data to the wide format; you might alternatively use function cast in package reshape for this, but I never can get the syntax right. I also prefer to do most of this preparatory work on the database level, e.g. with PIVOT. Create a translation table of the 25 possible combinations to the column you selected, and you can be sure you forgot no combination. Dieter outc = data.frame( patclin = as.factor( paste(c(1,1,1,1,3,3,3,3), c(1,3,3,3,5,5,5,5),sep=".")), vis = as.factor(c(2,1,2,3,1,3,4,5)), outcom = c(22,21,21,20,24,21,22,22)) outw = reshape(outc,v.names="outcom",idvar="patclin",timevar="vis", direction="wide") outw = outw[,order(names(outw))] # I am sure there is a more elegant way to do this # I prefer to do this type of work on the database level outw$code= as.factor( apply(sapply(outw[,1:5],function(x){as.integer(!is.na(x))}),1,paste, collapse="")) # Note : the values here are not exactly what you requeste, # use your logic to select columns here usevisit = data.frame(code=levels(outw$code),visit=c(2,3,4)) outw = merge(usevisit,outw) outw # you get a documented table of the columns you selected and # can use visit to select the column # code visit outcom.1 outcom.2 outcom.3 outcom.4 outcom.5 patclin #1 01000 2 NA 22 NA NA NA 1.1 #2 10111 3 24 NA 21 22 22 3.5 #3 11100 4 21 21 20 NA NA 1.3
On Wed, Mar 4, 2009 at 12:09 AM, Vedula, Satyanarayana <svedula at jhsph.edu> wrote:> Hi, > > Could someone help with coding this in R? > > I need to select one row per patient i in clinic j. The data is organized similar to that shown below. > > Two columns - patient i in column j identify each unique patient. There are two columns on outcome. Some patients have multiple rows with each row representing one visit, coded for in the column, visit. Some patients have just one row indicating data from a single visit. > > I need to select one row per patient i in clinic j using the following algorithm: > > If patient has outcome recorded at visit 2, then outcome = outcome columns at visit 2 > If patient does not have visit 2, then outcome = outcome at visit 5 > If patient does not have visit 2 and visit 5, then outcome = outcome at visit 4 > If patient does not have visits 2, 5, and 4, then outcome = outcome at visit 3 > If patient does not have visits 2, 5, 4, and 3, then outcome = outcome at visit 1 > If patient does not have any of the visits, outcome = missing > > > Patient ? ? Clinic ? ? Visit ? ? Outcome_left ? Outcome_right > patient 1 ?clinic 1 ? visit 2 ? ? ? ?22 ? ? ? ? ? ? ? ? ? ? ? ?21 > patient 1 ?clinic 3 ? visit 1 ? ? ? ?21 ? ? ? ? ? ? ? ? ? ? ? ?21 > patient 1 ?clinic 3 ? visit 2 ? ? ? ?21 ? ? ? ? ? ? ? ? ? ? ? ?22 > patient 1 ?clinic 3 ? visit 3 ? ? ? ?20 ? ? ? ? ? ? ? ? ? ? ? ?22 > patient 3 ?clinic 5 ? visit 1 ? ? ? ?24 ? ? ? ? ? ? ? ? ? ? ? ?21 > patient 3 ?clinic 5 ? visit 3 ? ? ? ?21 ? ? ? ? ? ? ? ? ? ? ? ?22 > patient 3 ?clinic 5 ? visit 4 ? ? ? ?22 ? ? ? ? ? ? ? ? ? ? ? ?23 > patient 3 ?clinic 5 ? visit 5 ? ? ? ?22 ? ? ? ? ? ? ? ? ? ? ? ?22 > > I need to select just the first row for patient 1/clinic 1; the second row (visit 2) for patient 1/clinic 3; and the fourth row (visit 5) for patient 3/clinic 5.I'd approach this problem in the following way: df <- read.csv(textConnection(" Patient,Clinic,Visit,Outcome_left,Outcome_right patient 1,clinic 1,visit 2,22,21 patient 1,clinic 3,visit 1,21,21 patient 1,clinic 3,visit 2,21,22 patient 1,clinic 3,visit 3,20,22 patient 3,clinic 5,visit 1,24,21 patient 3,clinic 5,visit 3,21,22 patient 3,clinic 5,visit 4,22,23 patient 3,clinic 5,visit 5,22,22 "), header = T) closeAllConnections() # With a single patient it's pretty easy to find the preferred visit preferred_visit <- paste("visit", c(2, 5, 4, 3, 1)) one <- subset(df, Patient == "patient 3" & Clinic == "clinic 5") best_visit <- na.omit(match(preferred_visit, one$Visit))[1] one[best_visit, ] # We then turn this into a function find_best_visit <- function(one) { best_visit <- na.omit(match(preferred_visit, one$Visit))[1] one[best_visit, ] } # Then apply it to every combination of patient and clinic with plyr ddply(df, .(Patient, Clinic), find_best_visit) # You can learn more about plyr at http://had.co.nz/plyr Hadley -- http://had.co.nz/