David Romano
2013-Feb-08 19:14 UTC
[R] question about reproducibility/consistency of principal component and lda directions in R
Hi everyone, I'm not exactly sure how to ask this question most clearly, but I hope that giving the context in which it occurs for me will help: I'm trying to compare the brain images of two patient populations; each image is composed of voxels (the 3D analogue of pixels), and I have two images per patient, one reflecting grey matter concentration at each voxel, and the other reflecting white matter concentration at each voxel. I determined the groups by means of an analysis that involved information from both types of images, and what I set out to do was to get a rough idea of where in the brain the two groups showed the most striking differences. My first attempt was to replace -- on a voxel by voxel basis -- the bivariate grey/white data by a combined univariate measure, namely the first principal component score. From these principal component scores I calculated Cohen's d to obtain a rough estimate of the effect size at each voxel, and the resulting brain images show very nice separation into meaningful brain regions, some corresponding to negative effect sizes and some to positive ones. What puzzles me about how nice the separation into brain regions is, is that the meaning of positive and negative is determined by the choice of the first principal component direction at each voxel, but this choice is -- in principle (no pun intended -- sorry!) -- arbitrary. (Meaning whether an eigenvector or its negative is chosen as the direction is in principle arbitrary.) So here are my questions: Does the algorithm used in R produce the same principal component directions if applied to the same data repeatedly? And if so, should the directions chosen by the algorithm change continuously with the data? For example, if one data set were obtained by applying a small amount of noise to another, should the resulting directions be close to each other (as opposed to close negative of each other)? (Assuming the data is far from being "singular" in some vague sense I'm not sure how to make precise.) My second attempt was to do the same, but with the first lda scores, so I have the same questions about lda directions, too. Any light you could shed on these questions would be very welcome! Thanks in advance, David Romano [[alternative HTML version deleted]]
Uwe Ligges
2013-Feb-09 19:43 UTC
[R] question about reproducibility/consistency of principal component and lda directions in R
On 08.02.2013 20:14, David Romano wrote:> Hi everyone, > > I'm not exactly sure how to ask this question most clearly, but I hope that > giving the context in which it occurs for me will help: I'm trying to > compare the brain images of two patient populations; each image is composed > of voxels (the 3D analogue of pixels), and I have two images per patient, > one reflecting grey matter concentration at each voxel, and the other > reflecting white matter concentration at each voxel. > > I determined the groups by means of an analysis that involved information > from both types of images, and what I set out to do was to get a rough idea > of where in the brain the two groups showed the most striking differences. > > My first attempt was to replace -- on a voxel by voxel basis -- the > bivariate grey/white data by a combined univariate measure, namely the > first principal component score. From these principal component scores I > calculated Cohen's d to obtain a rough estimate of the effect size at each > voxel, and the resulting brain images show very nice separation into > meaningful brain regions, some corresponding to negative effect sizes and > some to positive ones. > > What puzzles me about how nice the separation into brain regions is, is > that the meaning of positive and negative is determined by the choice of > the first principal component direction at each voxel, but this choice is > -- in principle (no pun intended -- sorry!) -- arbitrary. (Meaning whether > an eigenvector or its negative is chosen as the direction is in principle > arbitrary.) > > So here are my questions: Does the algorithm used in R produce the same > principal component directions if applied to the same data repeatedly?Yes, but it may change if you execute it on another machine (depends on compiler hence also 32-bit vs 64-bit and OS).> And if so, should the directions chosen by the algorithm change > continuously with the data? For example, if one data set were obtained by > applying a small amount of noise to another, should the resulting > directions be close to each other (as opposed to close negative of each > other)? (Assuming the data is far from being "singular" in some vague > sense I'm not sure how to make precise.)Noise means the sign can change again. Of course, you can define yourself e.g. the direction of the very first value and change signs otherwise.> My second attempt was to do the same, but with the first lda scores, so I > have the same questions about lda directions, too.Same for lda. Best, Uwe Ligges> Any light you could shed on these questions would be very welcome! > > Thanks in advance, > David Romano > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >