Dear list, I performed a multivariate analysis on freshwater invertebrates data. So I obtained coordinates of my samples on the axes defining the first factorial plane (F1 and F2). I would like to see if the positions on my factorial plan could be linked to levels of impairment ('low' vs 'significant') for several water quality pressure categories and which pressure categories were the most important to explain my data. I first used random forests (package randomForest) to independently regressed the F1 and F2 coordinates against my pressures levels. These models explained around 13% of the variability for the first axis and 1.5% or the second axis. I heard about multi-response modelizations and tried to model the bi-variate response F1+F2 from the same set of pressure levels. This time, the model explained around 37% of the variability, that was great. But I don't understand what is precisely modeled in such multi-response regressions with random forest, when I used the predict() function on my data I obtained only one value for each sample. What correspond to this prediction? F1, F2, some combination of the both? Any advice and links to helpful litterature would be appreciated, Thanks, C?dric ___________________________________________________________________ Here is a small extract of my input data : ID F1 F2 WQ1 WQ2 WQ3 WQ4 423007 -0.181720936 -0.031683254 Impaired Impaired Impaired Impaired 423432 -0.013823243 -0.044562244 Good Good Impaired Good 382886 -0.062171083 0.095592402 Good Impaired Good Impaired 349067 0.165199490 -0.006247771 Impaired Good Impaired Good 350787 -0.086522253 -0.001156491 Good Good Impaired Good 423700 -0.094519496 0.058552236 Good Good Impaired Good 1473 -0.030547960 0.041201208 Good Good Impaired Good 422893 -0.381074618 -0.108488149 Good Good Good Good 424323 -0.200710868 0.008960769 Good Impaired Impaired Impaired 351117 -0.026336697 -0.011788642 Good Good Impaired Good 423356 -0.095307898 0.032821813 Good Good Impaired Good 52 0.181933163 -0.070008234 Good Good Good Good 529 0.201013553 -0.039925550 Good Good Good Good 123 0.049202307 -0.255373209 Good Good Good Good 424332 -0.201756587 -0.007161893 Good Good Impaired Good 423925 0.182053115 -0.163286598 Good Good Good Good 422967 0.009489423 0.078132841 Good Good Impaired Good 423899 0.042904501 0.022193773 Good Good Good Good 350912 0.031308796 0.066608196 Good Good Good Good 422988 -0.049664431 0.063449869 Good Good Impaired Good This is the formula I used for my model: mod=randomForest((F1+F2)~., data=data, ntree = 500, mtry sqrt(ncol(data)-1)) The model summary: Call: randomForest(formula = (F1 + F2) ~ ., data = data, ntree = 500, mtry = sqrt(ncol(data) - 1)) Type of random forest: regression Number of trees: 500 No. of variables tried at each split: 4 Mean of squared residuals: 0.01772612 % Var explained: 37.98 And finally the predictions: prediction 423007 -0.256445319 423432 -0.078636802 382886 -0.088890538 349067 -0.118654211 350787 -0.112655013 423700 0.018815905 1473 -0.032085983 422893 -0.303123232 424323 -0.226793376 351117 0.008599632 423356 -0.038947801 52 0.120712909 529 0.043381647 123 -0.087297539 424332 -0.180140229 423925 0.078654535 422967 -0.012138644 423899 0.078367004 350912 0.078654535 422988 0.014915818