Vishal Thapar
2011-Aug-03 14:06 UTC
[R] Combining multiple dependent variables for machine learning
Hi, I apologize for posting this here, I am also trying to post this on machine learning emailing lists. I have a set (18K) of sequences (22 nt long) and I have their counts at 4 different stages. The difference in counts from one stage to the next represents how well the sequence performed in the transition. The total counts remain about the same in each stage. So if a 1 sequence loses some counts in 1 stage, another sequence gains those counts in that stage. I am trying to build a predictor that combines these 4 stages. I have already tried to build an SVM using just the counts in the final stage but its not that great (0.3 correlation with test set). The problem I am facing now is how to combine these 4 stages into 1 dependent variable or something like that. The 4 stages are the dependent variables and the sequence is my independent variable. The aim is to use the count information in each stage to select how well the sequence performs across all 4 stages. I appreciate any suggestions for this problem. Sincerely, Vishal [[alternative HTML version deleted]]
Sarah Goslee
2011-Aug-03 14:16 UTC
[R] Combining multiple dependent variables for machine learning
Hi, On Wed, Aug 3, 2011 at 10:06 AM, Vishal Thapar <vishalthapar at gmail.com> wrote:> Hi, > > I apologize for posting this here, I am also trying to post this on machine > learning emailing lists.> I have a set (18K) of sequences (22 nt long) and I have their counts at 4 > different stages. The difference in counts from one stage to the next > represents how well the sequence performed in the transition. The total > counts remain about the same in each stage. So if a 1 sequence loses some > counts in 1 stage, another sequence gains those counts in that stage. I am > trying to build a predictor that combines these 4 stages. I have already > tried to build an SVM using just the counts in the final stage but its not > that great (0.3 correlation with test set). The problem I am facing now is > how to combine these 4 stages into 1 dependent variable or something like > that. The 4 stages are the dependent variables and the sequence is my > independent variable. The aim is to use the count information in each stage > to select how well the sequence performs across all 4 stages. > > I appreciate any suggestions for this problem.Suggestions? Yes. Read the posting guide and follow it. It isn't clear that this is even an R question, since you don't tell us anything about the packages or functions you are using, or about your data. There aren't any actual questions in your message, and your problem statement is exceedingly vague. You might find more help on the Bioconductor list, if in fact you are using R for your problem. Sarah -- Sarah Goslee http://www.functionaldiversity.org