Dear All, I am attempting to use the FANNY fuzzy clustering function in R (Kaufman & Rousseeuw, 1990), found in the "cluster" package. I have run into a variety of difficulties; the two most crucial difficulties are enumerated below. 1. Where is the 'm' parameter in FANNY? In _Finding Groups in Data: An Introduction to Cluster Analysis_ (1990) by Kaufman & Rousseeuw, the authors discuss the FANNY algorithm. On pages 189-190, they attempt to demonstrate an equivalence between Fuzzy c-Means (FCM) and FANNY. In doing so they, appear to be assuming that the value of the 'm' parameter in FCM (a measure of the fuzziness) is fixed at m=2. Although this is how FCM was originally formulated, it eventually became apparent that m should be a user-specified parameter and not a fixed value of m=2. My question, then, is twofold. Firstly, am I correct in saying that Kaufman & Rousseeuw have assumed m=2 in their formulation of Fuzzy c-Means and FANNY? Secondly, is it possible to modify the FANNY algorithm to allow user-specification of the m (fuzziness) parameter? 2. What do I do with training data? Is there any way to use FANNY for assigning clustering membership values to new, test data? In Fuzzy c-Means, new data is compared to the cluster centers in order to assign clustering membership values to the test data. However, in FANNY these centers do not exist. Is there, then, any way to compute the FANNY clustering membership values of a test data point without affecting the clustering membership values of the training data? Perhaps there are enough conditions to use the objective function as a way of computing the membership values of the test data? Aamir M University of Toronto
>>>>> "Aamir" == Aamir M <intuitionist at gmail.com> >>>>> on Mon, 30 May 2005 15:57:29 -0400 writes:Aamir> Dear All, I am attempting to use the FANNY fuzzy Aamir> clustering function in R (Kaufman & Rousseeuw, 1990), Aamir> found in the "cluster" package. I have run into a Aamir> variety of difficulties; the two most crucial Aamir> difficulties are enumerated below. Aamir> 1. Where is the 'm' parameter in FANNY? In _Finding Aamir> Groups in Data: An Introduction to Cluster Analysis_ Aamir> (1990) by Kaufman & Rousseeuw, the authors discuss Aamir> the FANNY algorithm. On pages 189-190, they attempt Aamir> to demonstrate an equivalence between Fuzzy c-Means Aamir> (FCM) and FANNY. The section is called "*Related* Methods and References" and they (as most Statisticians) use the word "k-Means"... The first point in that section is to explain the NON-equivalence: fanny() works with arbitrary dissimilarities d[i,j] whereas all versions of k-means have to assume a euclidean measurement space. K&R show that *when* you have a data matrix X, and then use distances d[i,j] := || x_{.,i} - x_{.,j} ||^2 (i.e. SQUARED Euclidean distances), then fanny() does the same as fuzzy k-means. But they even say there, that they'd rather use the non-squared distance {as fanny() does by default}, i.e., exponent 1, not 2, for good robustness reasons. If this is your 'm' below, then fanny() already does what you might want by default. OTOH, you can always pass squared distances to fanny() .. Aamir> In doing so they, appear to be Aamir> assuming that the value of the 'm' parameter in FCM Aamir> (a measure of the fuzziness) is fixed at m=2. there is no 'm' in the book there, but they talk about the exponent "^ 2" used in some places {but "^ 1" in other places}, notably in 5.2 "Why did we choose FANNY?" There is no "fuzziness" parameter defined there, so can you be more specific? Aamir> Although this is how FCM was originally Aamir> formulated, it eventually became apparent that m Aamir> should be a user-specified parameter and not a fixed Aamir> value of m=2. My question, then, is Aamir> twofold. Firstly, am I correct in saying that Kaufman Aamir> & Rousseeuw have assumed m=2 in their formulation of Aamir> Fuzzy c-Means and FANNY? Secondly, is it possible to Aamir> modify the FANNY algorithm to allow Aamir> user-specification of the m (fuzziness) parameter? Maybe, if you tell me what you are taking about. See the section 5.2 mentioned above Is it the exponent 2 in u_{jv}^2 ? That one is currently fixed at 2, and yes, that could be made a parameter though K & R warn against going all the way to "1" where their algorithm can happend to converge very slowly. Aamir> 2. What do I do with training data? Is there any way Aamir> to use FANNY for assigning clustering membership Aamir> values to new, test data? In Fuzzy c-Means, new data Aamir> is compared to the cluster centers in order to assign Aamir> clustering membership values to the test Aamir> data. However, in FANNY these centers do not Aamir> exist. correct. Note again that in general such centers don't even make sense, since the sample space may contain unordered categorical variables mixed with continuous ones {and you would be advised to use daisy(), not dist() to compute dissimilarities for such data}. Aamir> Is there, then, any way to compute the FANNY Aamir> clustering membership values of a test data point Aamir> without affecting the clustering membership values of Aamir> the training data? Perhaps there are enough Aamir> conditions to use the objective function as a way of Aamir> computing the membership values of the test data? Aamir> Aamir M University of Toronto That's an interesting proposal, at least the way I choose to understand you :-) Yes, why not look at the objective function C {eq.(1), p.182} One could think of optimizing it with respect to new data only, by keeping all "old data" memberships. For that to work, one would need the n dissimilarites d[i', j] where i' : `index for' new data j = 1,..,n : indices for training data. Is this feasible in your situation? Alternatively, when we *did* assume ``all continuous'' data *and* the use of simple Euclidean distances, we could easily compute the cluster centers, determine (by minimization!) memberships for new observations. In any case that needs some assumptions (and code!) currently not part of fanny().
Martin> there is no 'm' in the book there, but they talk about the Martin> exponent "^ 2" used in some places {but "^ 1" in other places}, Martin> notably in 5.2 "Why did we choose FANNY?" Martin> There is no "fuzziness" parameter defined there, so can you be Martin> more specific? Martin> Is it the exponent 2 in u_{jv}^2 ? Martin> That one is currently fixed at 2, and yes, that could be made a Martin> parameter though K & R warn against going all the way to "1" Martin> where their algorithm can happend to converge very slowly. Yes, that is what I am referring to. If you refer to equation (1) in section 4.1 of K&R (1990), where the FANNY objective function is defined, you can see that the membership values are all raised to the power two. In fact, the choice of raising them to the power 2 is arbitrary. Rather, the value of this exponent should be a user specified parameter. This is called the "m parameter" or the "fuzziness parameter" in Fuzzy k-Means. Now that you mentioned it, I see that K&R did in fact comment on this in section 5.2. K&R say that setting m=1 will cause slower convergence; in Fuzzy k-Means, setting m=1 will cause a hard clustering (minumum fuzziness), and setting m=infinity will cause maximum fuzziness (i.e. all cluster membership values will be equal to 1/k). They go on to say that "exponents equal to 2 seem to be a reasonable choice, as is confirmed by actual clustering analyses." I do not know about FANNY, but in Fuzzy k-Means, studies have shown that values of the exponents between 1 and 2 can lead to better results than the rather arbitrary choice of m=2. Aamir> Is there, then, any way to compute the FANNY Aamir> clustering membership values of a test data point Aamir> without affecting the clustering membership values of Aamir> the training data? Perhaps there are enough Aamir> conditions to use the objective function as a way of Aamir> computing the membership values of the test data? Martin> That's an interesting proposal, at least the way I choose to understand you :-) Martin> Yes, why not look at the objective function C {eq.(1), p.182} Martin> One could think of optimizing it with respect to new data only, Martin> by keeping all "old data" memberships. Martin> For that to work, one would need the n dissimilarites Martin> d[i', j] where i' : `index for' new data Martin> j = 1,..,n : indices for training data. Martin> Is this feasible in your situation? Yes, this would be feasible, I think. If I understand it correctly, this would just involve recomputing the DAISY dissimilarity matrix on the combined set of both training data and test data. It seems that the resulting optimization problem would also be uniquely solvable. Martin> Alternatively, when we *did* assume ``all continuous'' data Martin> *and* the use of simple Euclidean distances, Martin> we could easily compute the cluster centers, determine (by Martin> minimization!) memberships for new observations. The problem of "predicting" fuzzy cluster memberships for new data appears to be much simpler in Euclidean space; one could just compare the new data to the cluster centers computed in Fuzzy k-Means. Unfortunately, the data I'm working with is not all continuous. Martin> In any case that needs some assumptions (and code!) currently Martin> not part of fanny(). I'll have to work on this. Thought I'm guessing fanny() is written in FORTRAN, which I cannot (yet) program in. - Aamir