thr3ads.net - R help - [R] Subsetting problem data, 2 [Jul 2012]

If this information is useful, please help other people find it:
Share via:

Lib Gray

2012-Jul-19 14:15 UTC

[R] Subsetting problem data, 2

Hello,

I didn't give enough information when I sent an query before, so I'm
trying
again with a more detailed explanation:

In this data set, each patient has a different number of measured variables
(they represent tumors, so some people had 2 tumors, some had 5, etc). The
problem I have is that often in later cycles for a patient, tumors that
were originally measured are now missing (or a "new" tumor showed up).
We
assume there are many different reasons for why a tumor would be measured
in one cycle and not another, and so I want to subset OUT the
"problem"
patients to better study these patterns.

An example:

Patient  Cycle  V1  V2  V3  V4  V5
A  1  0.4  0.1  0.5  1.5  NA
A  2  0.3  0.2  0.5  1.6  NA
A  3  0.3  NA  0.6  1.7  NA
A  4  0.4  NA  0.4  1.8  NA
A  5  0.5  0.2  0.5  1.5  NA

I want to keep patient A; they have 4 measured tumors, but tumor 2 is
missing data for cycles 3 and 4

B  1  0.4  NA  NA  NA  NA
B  2  0.4  NA  NA  NA  NA

I do not want to keep patient B; they have 1 tumor that is measure
consistently in both cycles

C  1  0.9  0.9  0.9  NA  NA
C  3  0.3  0.5  0.6  NA  NA
C  4  NA  NA  NA  NA  NA
C  5  0.4  NA  NA  NA  NA

I do want to keep patient C; all their data is missing for cycle 4 and
cycle 5 only measured one tumor

D  1  0.2  0.5  NA  NA  NA
D  2  0.5  0.7  NA  NA  NA
D  4  0.6  0.4  NA  NA  NA
D  5  0.5  0.5  NA  NA  NA

I do not want patient D, their two tumors were measured each cycle

E  1  0.1  NA  NA  NA  NA
E  2  0.5  0.3  NA  NA  NA
E  3  0.4  0.3  NA  NA  NA

I DO want patient E; they only had one tumor register in Cycle 1, but
cycles 2 and 3 had two tumors.


Thanks for any help!

	[[alternative HTML version deleted]]

Rui Barradas

2012-Jul-19 20:27 UTC

head link

[R] Subsetting problem data, 2

Hello,

Try the following. The data is your example of Patient A through E, but 
from the output of dput().

dat <- structure(list(Patient = structure(c(1L, 1L, 1L, 1L, 1L, 2L,
2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L), .Label = c("A",
"B", "C", "D", "E"), class =
"factor"), Cycle = c(1L, 2L, 3L,
4L, 5L, 1L, 2L, 1L, 3L, 4L, 5L, 1L, 2L, 4L, 5L, 1L, 2L, 3L),
     V1 = c(0.4, 0.3, 0.3, 0.4, 0.5, 0.4, 0.4, 0.9, 0.3, NA, 0.4,
     0.2, 0.5, 0.6, 0.5, 0.1, 0.5, 0.4), V2 = c(0.1, 0.2, NA,
     NA, 0.2, NA, NA, 0.9, 0.5, NA, NA, 0.5, 0.7, 0.4, 0.5, NA,
     0.3, 0.3), V3 = c(0.5, 0.5, 0.6, 0.4, 0.5, NA, NA, 0.9, 0.6,
     NA, NA, NA, NA, NA, NA, NA, NA, NA), V4 = c(1.5, 1.6, 1.7,
     1.8, 1.5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
     NA), V5 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
     NA, NA, NA, NA, NA, NA)), .Names = c("Patient",
"Cycle",
"V1", "V2", "V3", "V4", "V5"),
class = "data.frame", row.names = c(NA,
-18L))

dat

nms <- names(dat)[grep("^V[1-9]$", names(dat))]
dd <- split(dat, dat$Patient)
fun <- function(x) any(is.na(x)) && any(!is.na(x))
ix <- sapply(dd, function(x) Reduce(`|`, lapply(x[, nms], fun)))

dd[ix]
do.call(rbind, dd[ix])


I'm assuming that the variables names are as posted, V followed by one 
single digit 1-9. To keep the Patients with complete cases just negate 
the index 'ix', it's a logical index.
Note also that dput() is the best way of posting a data example.

Hope this helps,

Rui Barradas

Em 19-07-2012 15:15, Lib Gray escreveu:> Hello,
>
> I didn't give enough information when I sent an query before, so
I'm trying
> again with a more detailed explanation:
>
> In this data set, each patient has a different number of measured variables
> (they represent tumors, so some people had 2 tumors, some had 5, etc). The
> problem I have is that often in later cycles for a patient, tumors that
> were originally measured are now missing (or a "new" tumor showed
up). We
> assume there are many different reasons for why a tumor would be measured
> in one cycle and not another, and so I want to subset OUT the
"problem"
> patients to better study these patterns.
>
> An example:
>
> Patient  Cycle  V1  V2  V3  V4  V5
> A  1  0.4  0.1  0.5  1.5  NA
> A  2  0.3  0.2  0.5  1.6  NA
> A  3  0.3  NA  0.6  1.7  NA
> A  4  0.4  NA  0.4  1.8  NA
> A  5  0.5  0.2  0.5  1.5  NA
>
> I want to keep patient A; they have 4 measured tumors, but tumor 2 is
> missing data for cycles 3 and 4
>
> B  1  0.4  NA  NA  NA  NA
> B  2  0.4  NA  NA  NA  NA
>
> I do not want to keep patient B; they have 1 tumor that is measure
> consistently in both cycles
>
> C  1  0.9  0.9  0.9  NA  NA
> C  3  0.3  0.5  0.6  NA  NA
> C  4  NA  NA  NA  NA  NA
> C  5  0.4  NA  NA  NA  NA
>
> I do want to keep patient C; all their data is missing for cycle 4 and
> cycle 5 only measured one tumor
>
> D  1  0.2  0.5  NA  NA  NA
> D  2  0.5  0.7  NA  NA  NA
> D  4  0.6  0.4  NA  NA  NA
> D  5  0.5  0.5  NA  NA  NA
>
> I do not want patient D, their two tumors were measured each cycle
>
> E  1  0.1  NA  NA  NA  NA
> E  2  0.5  0.3  NA  NA  NA
> E  3  0.4  0.3  NA  NA  NA
>
> I DO want patient E; they only had one tumor register in Cycle 1, but
> cycles 2 and 3 had two tumors.
>
>
> Thanks for any help!
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Rui Barradas

2012-Jul-19 23:17 UTC

head link

[R] Subsetting problem data, 2

Hello,

I guess so, and I can save you some typing.

vars <- sort(apply(expand.grid("L", 1:8, 1:2), 1, paste,
collapse=""))


Then use it and see the result.

Rui Barradas

Em 20-07-2012 00:00, Lib Gray escreveu:> The variables are actually L11, L12, L21, L22, ... , L81, L82. Would just
> creating a vector c(L11,... ,L82) be fine? (I'm about to try it, but I
> wanted to check to see if that was going to be a big issue).
>
> On Thu, Jul 19, 2012 at 3:27 PM, Rui Barradas <ruipbarradas at
sapo.pt> wrote:
>
>> Hello,
>>
>> Try the following. The data is your example of Patient A through E, but
>> from the output of dput().
>>
>> dat <- structure(list(Patient = structure(c(1L, 1L, 1L, 1L, 1L, 2L,
>> 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L), .Label =
c("A",
>> "B", "C", "D", "E"), class =
"factor"), Cycle = c(1L, 2L, 3L,
>> 4L, 5L, 1L, 2L, 1L, 3L, 4L, 5L, 1L, 2L, 4L, 5L, 1L, 2L, 3L),
>>      V1 = c(0.4, 0.3, 0.3, 0.4, 0.5, 0.4, 0.4, 0.9, 0.3, NA, 0.4,
>>      0.2, 0.5, 0.6, 0.5, 0.1, 0.5, 0.4), V2 = c(0.1, 0.2, NA,
>>      NA, 0.2, NA, NA, 0.9, 0.5, NA, NA, 0.5, 0.7, 0.4, 0.5, NA,
>>      0.3, 0.3), V3 = c(0.5, 0.5, 0.6, 0.4, 0.5, NA, NA, 0.9, 0.6,
>>      NA, NA, NA, NA, NA, NA, NA, NA, NA), V4 = c(1.5, 1.6, 1.7,
>>      1.8, 1.5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>>      NA), V5 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>>      NA, NA, NA, NA, NA, NA)), .Names = c("Patient",
"Cycle",
>> "V1", "V2", "V3", "V4",
"V5"), class = "data.frame", row.names = c(NA,
>> -18L))
>>
>> dat
>>
>> nms <- names(dat)[grep("^V[1-9]$", names(dat))]
>> dd <- split(dat, dat$Patient)
>> fun <- function(x) any(is.na(x)) && any(!is.na(x))
>> ix <- sapply(dd, function(x) Reduce(`|`, lapply(x[, nms], fun)))
>>
>> dd[ix]
>> do.call(rbind, dd[ix])
>>
>>
>> I'm assuming that the variables names are as posted, V followed by
one
>> single digit 1-9. To keep the Patients with complete cases just negate
the
>> index 'ix', it's a logical index.
>> Note also that dput() is the best way of posting a data example.
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> Em 19-07-2012 15:15, Lib Gray escreveu:
>>
>>> Hello,
>>>
>>> I didn't give enough information when I sent an query before,
so I'm
>>> trying
>>> again with a more detailed explanation:
>>>
>>> In this data set, each patient has a different number of measured
>>> variables
>>> (they represent tumors, so some people had 2 tumors, some had 5,
etc). The
>>> problem I have is that often in later cycles for a patient, tumors
that
>>> were originally measured are now missing (or a "new"
tumor showed up). We
>>> assume there are many different reasons for why a tumor would be
measured
>>> in one cycle and not another, and so I want to subset OUT the
"problem"
>>> patients to better study these patterns.
>>>
>>> An example:
>>>
>>> Patient  Cycle  V1  V2  V3  V4  V5
>>> A  1  0.4  0.1  0.5  1.5  NA
>>> A  2  0.3  0.2  0.5  1.6  NA
>>> A  3  0.3  NA  0.6  1.7  NA
>>> A  4  0.4  NA  0.4  1.8  NA
>>> A  5  0.5  0.2  0.5  1.5  NA
>>>
>>> I want to keep patient A; they have 4 measured tumors, but tumor 2
is
>>> missing data for cycles 3 and 4
>>>
>>> B  1  0.4  NA  NA  NA  NA
>>> B  2  0.4  NA  NA  NA  NA
>>>
>>> I do not want to keep patient B; they have 1 tumor that is measure
>>> consistently in both cycles
>>>
>>> C  1  0.9  0.9  0.9  NA  NA
>>> C  3  0.3  0.5  0.6  NA  NA
>>> C  4  NA  NA  NA  NA  NA
>>> C  5  0.4  NA  NA  NA  NA
>>>
>>> I do want to keep patient C; all their data is missing for cycle 4
and
>>> cycle 5 only measured one tumor
>>>
>>> D  1  0.2  0.5  NA  NA  NA
>>> D  2  0.5  0.7  NA  NA  NA
>>> D  4  0.6  0.4  NA  NA  NA
>>> D  5  0.5  0.5  NA  NA  NA
>>>
>>> I do not want patient D, their two tumors were measured each cycle
>>>
>>> E  1  0.1  NA  NA  NA  NA
>>> E  2  0.5  0.3  NA  NA  NA
>>> E  3  0.4  0.3  NA  NA  NA
>>>
>>> I DO want patient E; they only had one tumor register in Cycle 1,
but
>>> cycles 2 and 3 had two tumors.
>>>
>>>
>>> Thanks for any help!
>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> ______________________________**________________
>>> R-help at r-project.org mailing list
>>>
https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>>> PLEASE do read the posting guide http://www.R-project.org/**
>>> posting-guide.html
<http://www.R-project.org/posting-guide.html>
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>

Maybe Matching Threads

Search for more maybe matching threads

R help - Jul 2012 - Subsetting problem data, 2

[R] Subsetting problem data, 2

[R] Subsetting problem data, 2

[R] Subsetting problem data, 2

Maybe Matching Threads