Hi! I tested two different implementations of the robust MCD estimator: cov.mcd from the MASS package and covMcd from the rrcov package. Tests were done on the hbk dataset included in the rrcov package. Unfortunately I get quite differing results -- so the question is whether this differences are justified or an error on my side or a bug? Here is, what I did:> require(MASS) > require(rrcov) > data(hbk)> mass.mcd<-cov.mcd(hbk,quantile.used=57) > rrcov.covMcd<-covMcd(hbk,alpha=0.75)> #output from cov.mcd (MASS) > mass.mcd$centerX1 X2 X3 Y 1.55833333 1.80333333 1.66000000 -0.08666667> mass.mcd$covX1 X2 X3 Y X1 1.12484463 0.02217514 0.1537288 0.07615819 X2 0.02217514 1.13897175 0.1814915 0.02029379 X3 0.15372881 0.18149153 1.0434576 -0.12877966 Y 0.07615819 0.02029379 -0.1287797 0.31236158> #output from covMcd (rrcov) > rrcov.covMcd$centerX1 X2 X3 Y 1.53770492 1.78032787 1.68688525 -0.07377049> rrcov.covMcd$covX1 X2 X3 Y X1 1.61921813 0.072595397 0.1678300 0.083905209 X2 0.07259540 1.648137481 0.2013022 0.002657454 X3 0.16782996 0.201302158 1.5306858 -0.150876964 Y 0.08390521 0.002657454 -0.1508770 0.453846286 As you can see, the results are quite different. I tried to start both calls with 75% (=57 of 75) good data-points. I crosschecked the results with the MCD implementation in MATLAB by Verboven and Hubert. This functions give the same results as cov.mcd (MASS). If somebody knows, why the results do not match, although both functions are implementation referring to the same estimator, please tell me. Thanks, Rainer -- 10 GB Mailbox, 100 FreeSMS http://www.gmx.net/de/go/topmail
The two implementations use different consistency factors as well as different small sample correction factors. 1. The search parts of both implementations produce the same result - compare rrcov.mcd$best and mass.mcd$best. 2. The raw MCD covariance matrix is corrected as follows: MASS: - Rousseeuw and Leroy (1987), p.259 (eq. 1.26) - Marazzi (1993) (or may be Rousseeuw and van Zomeren (1900) p.638 (eq A.9) rrcov: - Croux and Haesbroeck (1999), Pison et.al. p. 337 - Pison et.al. (2002), p.338 3. The reweighted (final) covariance matrix is corrected as follows: MASS: no correction rrcov: Pison et.al. (2002) p. 339 This explains the different covariance matrices. As far as the location is concerned, in this particular case the raw MCD estimates in MASS identify one additional outlier - observation 53, which is discarded from the computation of the reweighted estimates. Look at the following plots and judge yourself if this is an outlier or not: covPlot(hbk, mcd=rrcov.mcd, which="distance", id.n=15) covPlot(hbk, mcd=mass.mcd, which="distance", id.n=15) valentin
Thanks a lot! Indeed, both implementations agree on the 'best' points. Your answer helped me a great deal. Rainer> The two implementations use different consistency factors as well as > different small sample correction factors. > > 1. The search parts of both implementations produce the same result - > compare rrcov.mcd$best and mass.mcd$best. > > 2. The raw MCD covariance matrix is corrected as follows: > > MASS: > - Rousseeuw and Leroy (1987), p.259 (eq. 1.26) > - Marazzi (1993) (or may be Rousseeuw and van Zomeren (1900) p.638 (eq > A.9) > > rrcov: > - Croux and Haesbroeck (1999), Pison et.al. p. 337 > - Pison et.al. (2002), p.338 > > 3. The reweighted (final) covariance matrix is corrected as follows: > > MASS: no correction > rrcov: Pison et.al. (2002) p. 339 > > This explains the different covariance matrices. > As far as the location is concerned, in this particular case the raw MCD > estimates in MASS identify one additional outlier - observation 53, which > is > discarded from the computation of the reweighted estimates. > Look at the following plots and judge yourself if this is an outlier or > not: > > covPlot(hbk, mcd=rrcov.mcd, which="distance", id.n=15) > covPlot(hbk, mcd=mass.mcd, which="distance", id.n=15) > > valentin >-- GMX im TV ... Die Gedanken sind frei ... Schon gesehen? Jetzt Spot online ansehen: http://www.gmx.net/de/go/tv-spot