mauede at alice.it
2010-May-17 13:09 UTC
[R] looking for Variable selections models, techniques, methods
We still have a long way to go with the data we were given by some drug discovery scientists. The problem is to select the few variables (Collective Variables), from a set of variables sampled during a Molecular Dynamics simulation, which exhibit a consistent and coherent relationship with the given minimum-work curve, all over the time it takes the molecule to migrate from the initial configuration to the final configuration. I have already tried and ruled out a simple correlation. Someone here has suggested looking for correlation of variables and the work curve in a time window, for example 20 time steps wide (everty step is equal to 50 fs). But this meaningless (on my view) because it would dig out a transient relationship. Whereas what we need is a relationship that lasts consistently all over the configurational transformation period. I made some progress with techniques for Dimensionality Reduction. The problem is that such techniques do not select variables. For instance, if I can reduce the dimensionality, say from 100 to 8, still I am not likely to be able to find the 8 independent variables which carry most of the information. Very likely the basis of the 8-D embedding space will be obtained as functions (most probably non-linear) of the original 100 variable or anyhow a big subset of them. Bottom-line: Dimensionality Reduction does not directly achieve the problem goal which is to decimate the number of variables sampled during MD simulations leaving out the ones that are unimportant for the chemical-physical reaction in question. I would greatly appreciate suggestions & advice concerning techniques, methods, models to perform Variable Selection other than simple linear regression Thank you very much. Best regards, Maura Edelweiss M. tutti i telefonini TIM! [[alternative HTML version deleted]]