Hi,
I figured a workaround to my problem, but if anyone has any advice on how to
express a function in tapply to achieve the same outcome, that would be
awesome and I'd learn something about functions!
The workaround was
tapply ((data$Variation.Type %in% c(2,3)), data$Patient, sum)
Thanks.
Min-Han
On Fri, Mar 26, 2010 at 12:40 PM, Min-Han Tan
<minhan.science@gmail.com>wrote:
> Dear R-help members,
>
> Apologies for the trouble.
>
> I have a question :
>
> Essentially, I have a dataset which stores genetic variations for
> individual patients. Each individual patient can have more than one
> variation, and each new record corresponds to a new variation (thus, both
> individual patients and variations are non-unique).
>
> So the dataset looks something like this ((letters = patients, numbers >
variation type).
> Patient, Variation Type
> A, 1
> A, 2
> A, 3
> B, 1
> C, 2
> D, 2
> D, 3
> E, 2
> E, 4
> F, 4
>
> My final desired output is a data.frame or a vector containing patients,
> each corresponding to a desired subset of variations. For e.g., if I only
> was interested in variation type 2,3, my output would look like this.
>
> A, 2
> B, 0
> C, 1
> D, 2
> E, 1
> F, 0.
>
> I am trying to figure out how to use tapply to do this.
>
> It would be something like tapply (Variation Type, Patient, ??? )
>
> I am not sure about the function syntax of ??? to subselect only 2,3, and
> have been looking at the r-help.
>
> Sorry! Essentially, I am trying to avoid awkward loops in this whole
> process.
>
> Thanks very much for your advice!
>
> Min-Han
>
[[alternative HTML version deleted]]