Moritz Kebschull
2013-Jun-30 18:47 UTC
[R] Gene expression clustering using several dependent samples
Dear list. I am looking at a dataset comprised of Affy images from disease-affected tissue samples that I am trying to cluster. The problem is that we have 2+ biopsies per study subject, and I am not sure how to best account for their dependency. In contrast to cancer samples, these biopsies differ to a certain extent in their disease severity, i.e. they are not perfect replicates, but share certain similarities since they are from the same person. I first tried to just cluster all available biopsies using ConsensusClusterPlus. However, this produced clusters of biopsies according to their disease severity - often with different samples from the same patient assigned to different clusters - and that´s not what I want. I am trying to identify different classes between subjects, not biopsies. For the diff exp analyses, we dealt with this issue by adding the patient as a random effect to the model. Could I do something similar using model-based clustering, perhaps also adding a variable for disease severity? As an alternative, I have explored aggregating all available samples per subject into one expression profile, and cluster the pattients using these aggregates. I am, however, not convinced that this is right, since this approach creates 'artificial' data. Does anyone have an idea? Many thanks, Moritz [[alternative HTML version deleted]]
James C. Whanger
2013-Jul-01 13:37 UTC
[R] Gene expression clustering using several dependent samples
Hello Moritz, You may want to take a look at http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3052263/ . DeSantis et al. (2009) identified different genomic profiles using a Bayesian Latent Class methodology. Best, James On Sun, Jun 30, 2013 at 2:47 PM, Moritz Kebschull <moritz@kebschull.me>wrote:> Dear list. > > I am looking at a dataset comprised of Affy images from disease-affected > tissue samples that I am trying to cluster. > > The problem is that we have 2+ biopsies per study subject, and I am not > sure how to best account for their dependency. In contrast to cancer > samples, these biopsies differ to a certain extent in their disease > severity, i.e. they are not perfect replicates, but share certain > similarities since they are from the same person. > > I first tried to just cluster all available biopsies using > ConsensusClusterPlus. However, this produced clusters of biopsies according > to their disease severity - often with different samples from the same > patient assigned to different clusters - and that´s not what I want. I am > trying to identify different classes between subjects, not biopsies. > > For the diff exp analyses, we dealt with this issue by adding the patient > as a random effect to the model. Could I do something similar using > model-based clustering, perhaps also adding a variable for disease > severity? > > As an alternative, I have explored aggregating all available samples per > subject into one expression profile, and cluster the pattients using these > aggregates. I am, however, not convinced that this is right, since this > approach creates 'artificial' data. > > Does anyone have an idea? > > Many thanks, > > Moritz > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- *James C. Whanger* * * [[alternative HTML version deleted]]