Gareth Campbell
2008-Sep-21 03:22 UTC
[R] Variable Selection for data reduction and discriminant anlaysis
Hello all, I'm dealing with geochemical analyses of some rocks. If I use the full composition (31 elements or variables), I can get reasonable separation of my 6 sources. Then when I go onto do LDA with the 6 groups, I get excellent separation. I feel like I should be reducing the variables to thos that are providing the most discrimination between the groups as this is important information for me. I struggle to interpret the PCA plot in a way that helps me (due to the large number of elements). So I'm trying to do some sort of step-wise variable selection. I would love to hear from someone (possibly a geochemist or similar) who does this regularly to determine the best course of action in R to do this. Thanks very much -- Gareth Campbell PhD Candidate The University of Auckland P +649 815 3670 M +6421 256 3511 E gareth.campbell@esr.cri.nz gcam032@gmail.com [[alternative HTML version deleted]]
Mark Difford
2008-Sep-21 09:04 UTC
[R] Variable Selection for data reduction and discriminant anlaysis
Hi Gareth,>> If I use the full composition (31 elements or variables), I can get >> reasonable separation of my 6 sources.A word of advice: You need to be exceptionally careful when analyzing compositional data. Taking compositions puts your data values into a constrained/bounded space (generally called a simplex) so that most standard statistical procedures (i.e. anything that uses a Euclidean metric, and most do) deliver erroneous results. Pearson wrote a paper on this long ago, but it's generally been ignored (except by Aitchison and the Spanish School of mathematical statisticians). The problem is comparatively well known to geologists, who work with compositional much of the time. R has a very good package for analysing this data-type: see the compositions package (a new release seems iminent). You will be able to get most of the main references from it. (The authors of the package also have a newly-released article in one of the Elsevier journals [unfor. my bib+ are elsewhere so I cannot give details]). You could start by Wiki'ing your way to "compositional data". HTH, Mark. Gareth Campbell wrote:> > Hello all, > > I'm dealing with geochemical analyses of some rocks. > > If I use the full composition (31 elements or variables), I can get > reasonable separation of my 6 sources. Then when I go onto do LDA with > the > 6 groups, I get excellent separation. > > I feel like I should be reducing the variables to thos that are providing > the most discrimination between the groups as this is important > information > for me. I struggle to interpret the PCA plot in a way that helps me (due > to > the large number of elements). So I'm trying to do some sort of step-wise > variable selection. > > I would love to hear from someone (possibly a geochemist or similar) who > does this regularly to determine the best course of action in R to do > this. > > > Thanks very much > > > -- > Gareth Campbell > PhD Candidate > The University of Auckland > > P +649 815 3670 > M +6421 256 3511 > E gareth.campbell at esr.cri.nz > gcam032 at gmail.com > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- View this message in context: http://www.nabble.com/Variable-Selection-for-data-reduction-and-discriminant-anlaysis-tp19591270p19592695.html Sent from the R help mailing list archive at Nabble.com.
Katharine Mullen
2008-Sep-21 16:43 UTC
[R] Variable Selection for data reduction and discriminant anlaysis
There are some pointers to packages for variable selection in the task view for Chemometrics and Computational Physics at http://cran.r-project.org/web/views/ChemPhys.html On Sun, 21 Sep 2008, Gareth Campbell wrote:> Hello all, > > I'm dealing with geochemical analyses of some rocks. > > If I use the full composition (31 elements or variables), I can get > reasonable separation of my 6 sources. Then when I go onto do LDA with the > 6 groups, I get excellent separation. > > I feel like I should be reducing the variables to thos that are providing > the most discrimination between the groups as this is important information > for me. I struggle to interpret the PCA plot in a way that helps me (due to > the large number of elements). So I'm trying to do some sort of step-wise > variable selection. > > I would love to hear from someone (possibly a geochemist or similar) who > does this regularly to determine the best course of action in R to do this. > > > Thanks very much > > > -- > Gareth Campbell > PhD Candidate > The University of Auckland > > P +649 815 3670 > M +6421 256 3511 > E gareth.campbell at esr.cri.nz > gcam032 at gmail.com > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >