Hello, I didn't give enough information when I sent an query before, so I'm trying again with a more detailed explanation: In this data set, each patient has a different number of measured variables (they represent tumors, so some people had 2 tumors, some had 5, etc). The problem I have is that often in later cycles for a patient, tumors that were originally measured are now missing (or a "new" tumor showed up). We assume there are many different reasons for why a tumor would be measured in one cycle and not another, and so I want to subset OUT the "problem" patients to better study these patterns. An example: Patient Cycle V1 V2 V3 V4 V5 A 1 0.4 0.1 0.5 1.5 NA A 2 0.3 0.2 0.5 1.6 NA A 3 0.3 NA 0.6 1.7 NA A 4 0.4 NA 0.4 1.8 NA A 5 0.5 0.2 0.5 1.5 NA I want to keep patient A; they have 4 measured tumors, but tumor 2 is missing data for cycles 3 and 4 B 1 0.4 NA NA NA NA B 2 0.4 NA NA NA NA I do not want to keep patient B; they have 1 tumor that is measure consistently in both cycles C 1 0.9 0.9 0.9 NA NA C 3 0.3 0.5 0.6 NA NA C 4 NA NA NA NA NA C 5 0.4 NA NA NA NA I do want to keep patient C; all their data is missing for cycle 4 and cycle 5 only measured one tumor D 1 0.2 0.5 NA NA NA D 2 0.5 0.7 NA NA NA D 4 0.6 0.4 NA NA NA D 5 0.5 0.5 NA NA NA I do not want patient D, their two tumors were measured each cycle E 1 0.1 NA NA NA NA E 2 0.5 0.3 NA NA NA E 3 0.4 0.3 NA NA NA I DO want patient E; they only had one tumor register in Cycle 1, but cycles 2 and 3 had two tumors. Thanks for any help! [[alternative HTML version deleted]]
Hello, Try the following. The data is your example of Patient A through E, but from the output of dput(). dat <- structure(list(Patient = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L), .Label = c("A", "B", "C", "D", "E"), class = "factor"), Cycle = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 1L, 3L, 4L, 5L, 1L, 2L, 4L, 5L, 1L, 2L, 3L), V1 = c(0.4, 0.3, 0.3, 0.4, 0.5, 0.4, 0.4, 0.9, 0.3, NA, 0.4, 0.2, 0.5, 0.6, 0.5, 0.1, 0.5, 0.4), V2 = c(0.1, 0.2, NA, NA, 0.2, NA, NA, 0.9, 0.5, NA, NA, 0.5, 0.7, 0.4, 0.5, NA, 0.3, 0.3), V3 = c(0.5, 0.5, 0.6, 0.4, 0.5, NA, NA, 0.9, 0.6, NA, NA, NA, NA, NA, NA, NA, NA, NA), V4 = c(1.5, 1.6, 1.7, 1.8, 1.5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), V5 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("Patient", "Cycle", "V1", "V2", "V3", "V4", "V5"), class = "data.frame", row.names = c(NA, -18L)) dat nms <- names(dat)[grep("^V[1-9]$", names(dat))] dd <- split(dat, dat$Patient) fun <- function(x) any(is.na(x)) && any(!is.na(x)) ix <- sapply(dd, function(x) Reduce(`|`, lapply(x[, nms], fun))) dd[ix] do.call(rbind, dd[ix]) I'm assuming that the variables names are as posted, V followed by one single digit 1-9. To keep the Patients with complete cases just negate the index 'ix', it's a logical index. Note also that dput() is the best way of posting a data example. Hope this helps, Rui Barradas Em 19-07-2012 15:15, Lib Gray escreveu:> Hello, > > I didn't give enough information when I sent an query before, so I'm trying > again with a more detailed explanation: > > In this data set, each patient has a different number of measured variables > (they represent tumors, so some people had 2 tumors, some had 5, etc). The > problem I have is that often in later cycles for a patient, tumors that > were originally measured are now missing (or a "new" tumor showed up). We > assume there are many different reasons for why a tumor would be measured > in one cycle and not another, and so I want to subset OUT the "problem" > patients to better study these patterns. > > An example: > > Patient Cycle V1 V2 V3 V4 V5 > A 1 0.4 0.1 0.5 1.5 NA > A 2 0.3 0.2 0.5 1.6 NA > A 3 0.3 NA 0.6 1.7 NA > A 4 0.4 NA 0.4 1.8 NA > A 5 0.5 0.2 0.5 1.5 NA > > I want to keep patient A; they have 4 measured tumors, but tumor 2 is > missing data for cycles 3 and 4 > > B 1 0.4 NA NA NA NA > B 2 0.4 NA NA NA NA > > I do not want to keep patient B; they have 1 tumor that is measure > consistently in both cycles > > C 1 0.9 0.9 0.9 NA NA > C 3 0.3 0.5 0.6 NA NA > C 4 NA NA NA NA NA > C 5 0.4 NA NA NA NA > > I do want to keep patient C; all their data is missing for cycle 4 and > cycle 5 only measured one tumor > > D 1 0.2 0.5 NA NA NA > D 2 0.5 0.7 NA NA NA > D 4 0.6 0.4 NA NA NA > D 5 0.5 0.5 NA NA NA > > I do not want patient D, their two tumors were measured each cycle > > E 1 0.1 NA NA NA NA > E 2 0.5 0.3 NA NA NA > E 3 0.4 0.3 NA NA NA > > I DO want patient E; they only had one tumor register in Cycle 1, but > cycles 2 and 3 had two tumors. > > > Thanks for any help! > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hello, I guess so, and I can save you some typing. vars <- sort(apply(expand.grid("L", 1:8, 1:2), 1, paste, collapse="")) Then use it and see the result. Rui Barradas Em 20-07-2012 00:00, Lib Gray escreveu:> The variables are actually L11, L12, L21, L22, ... , L81, L82. Would just > creating a vector c(L11,... ,L82) be fine? (I'm about to try it, but I > wanted to check to see if that was going to be a big issue). > > On Thu, Jul 19, 2012 at 3:27 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote: > >> Hello, >> >> Try the following. The data is your example of Patient A through E, but >> from the output of dput(). >> >> dat <- structure(list(Patient = structure(c(1L, 1L, 1L, 1L, 1L, 2L, >> 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L), .Label = c("A", >> "B", "C", "D", "E"), class = "factor"), Cycle = c(1L, 2L, 3L, >> 4L, 5L, 1L, 2L, 1L, 3L, 4L, 5L, 1L, 2L, 4L, 5L, 1L, 2L, 3L), >> V1 = c(0.4, 0.3, 0.3, 0.4, 0.5, 0.4, 0.4, 0.9, 0.3, NA, 0.4, >> 0.2, 0.5, 0.6, 0.5, 0.1, 0.5, 0.4), V2 = c(0.1, 0.2, NA, >> NA, 0.2, NA, NA, 0.9, 0.5, NA, NA, 0.5, 0.7, 0.4, 0.5, NA, >> 0.3, 0.3), V3 = c(0.5, 0.5, 0.6, 0.4, 0.5, NA, NA, 0.9, 0.6, >> NA, NA, NA, NA, NA, NA, NA, NA, NA), V4 = c(1.5, 1.6, 1.7, >> 1.8, 1.5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >> NA), V5 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >> NA, NA, NA, NA, NA, NA)), .Names = c("Patient", "Cycle", >> "V1", "V2", "V3", "V4", "V5"), class = "data.frame", row.names = c(NA, >> -18L)) >> >> dat >> >> nms <- names(dat)[grep("^V[1-9]$", names(dat))] >> dd <- split(dat, dat$Patient) >> fun <- function(x) any(is.na(x)) && any(!is.na(x)) >> ix <- sapply(dd, function(x) Reduce(`|`, lapply(x[, nms], fun))) >> >> dd[ix] >> do.call(rbind, dd[ix]) >> >> >> I'm assuming that the variables names are as posted, V followed by one >> single digit 1-9. To keep the Patients with complete cases just negate the >> index 'ix', it's a logical index. >> Note also that dput() is the best way of posting a data example. >> >> Hope this helps, >> >> Rui Barradas >> >> Em 19-07-2012 15:15, Lib Gray escreveu: >> >>> Hello, >>> >>> I didn't give enough information when I sent an query before, so I'm >>> trying >>> again with a more detailed explanation: >>> >>> In this data set, each patient has a different number of measured >>> variables >>> (they represent tumors, so some people had 2 tumors, some had 5, etc). The >>> problem I have is that often in later cycles for a patient, tumors that >>> were originally measured are now missing (or a "new" tumor showed up). We >>> assume there are many different reasons for why a tumor would be measured >>> in one cycle and not another, and so I want to subset OUT the "problem" >>> patients to better study these patterns. >>> >>> An example: >>> >>> Patient Cycle V1 V2 V3 V4 V5 >>> A 1 0.4 0.1 0.5 1.5 NA >>> A 2 0.3 0.2 0.5 1.6 NA >>> A 3 0.3 NA 0.6 1.7 NA >>> A 4 0.4 NA 0.4 1.8 NA >>> A 5 0.5 0.2 0.5 1.5 NA >>> >>> I want to keep patient A; they have 4 measured tumors, but tumor 2 is >>> missing data for cycles 3 and 4 >>> >>> B 1 0.4 NA NA NA NA >>> B 2 0.4 NA NA NA NA >>> >>> I do not want to keep patient B; they have 1 tumor that is measure >>> consistently in both cycles >>> >>> C 1 0.9 0.9 0.9 NA NA >>> C 3 0.3 0.5 0.6 NA NA >>> C 4 NA NA NA NA NA >>> C 5 0.4 NA NA NA NA >>> >>> I do want to keep patient C; all their data is missing for cycle 4 and >>> cycle 5 only measured one tumor >>> >>> D 1 0.2 0.5 NA NA NA >>> D 2 0.5 0.7 NA NA NA >>> D 4 0.6 0.4 NA NA NA >>> D 5 0.5 0.5 NA NA NA >>> >>> I do not want patient D, their two tumors were measured each cycle >>> >>> E 1 0.1 NA NA NA NA >>> E 2 0.5 0.3 NA NA NA >>> E 3 0.4 0.3 NA NA NA >>> >>> I DO want patient E; they only had one tumor register in Cycle 1, but >>> cycles 2 and 3 had two tumors. >>> >>> >>> Thanks for any help! >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________**________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >>> PLEASE do read the posting guide http://www.R-project.org/** >>> posting-guide.html <http://www.R-project.org/posting-guide.html> >>> and provide commented, minimal, self-contained, reproducible code. >>> >>