*Apologies if this is not the right way to ask a question, I'm a first timer posting here. Does anyone have a solution to this? I'm having trouble figuring out how to use weighting with K Means Clustering. So say if my dataset is: Column 1 = x coords Column 2 = y coords Column 3 = frequency each coordinate occurs So I'm basically trying to weight the points more heavily if they occur more frequently. I've been trying kmeans(a[,1:2], centers=52, weights=a[,3]) It works well before adding in the weights, it also doesn't work with "weights=c(frequency 1, frequency 2, .)" and a few others I've tried. Maybe I don't know how to search the previous topics or the software help well enough yet, but I haven't come across an example that lays out weighting yet. Thank you in advance to anyone who has the answer. Jesse NOTICE - This communication is intended ONLY for the use of the person or entity named above and may contain information that is confidential or legally privileged. If you are not the intended recipient named above or a person responsible for delivering messages or communications to the intended recipient, YOU ARE HEREBY NOTIFIED that any use, distribution, or copying of this communication or any of the information contained in it is strictly prohibited. If you have received this communication in error, please notify us immediately by telephone and then destroy or delete this communication, or return it to us by mail if requested by us. The City of Calgary thanks you for your attention and co-operation. [[alternative HTML version deleted]]
Bill.Venables at csiro.au
2008-Feb-05 22:49 UTC
[R] K Means Clustering Weighted by Frequency
kmeans doesn't allow weights. Since your weights are frequencies, though, there is a slightly inelegant way of handling it. You need to unwind the frequencies and let each point enter the calculation separately. (OK, very inelegant!) A <- a[rep(1:nrow(a), a[, 3]), 1:2] ### expanded version km <- kmeans(A, centers = 52) If sum(a[, 3]) is huge, which is often the case when you go to frequencies, you may want to trim things a bit and deal with samples from the lot, but that's another story. Bill Venables. Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile: +61 4 8819 4402 Home Phone: +61 7 3286 7700 mailto:Bill.Venables at csiro.au http://www.cmis.csiro.au/bill.venables/ -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Aylward, Jesse Sent: Wednesday, 6 February 2008 7:17 AM To: r-help at r-project.org Subject: [R] K Means Clustering Weighted by Frequency *Apologies if this is not the right way to ask a question, I'm a first timer posting here. Does anyone have a solution to this? I'm having trouble figuring out how to use weighting with K Means Clustering. So say if my dataset is: Column 1 = x coords Column 2 = y coords Column 3 = frequency each coordinate occurs So I'm basically trying to weight the points more heavily if they occur more frequently. I've been trying kmeans(a[,1:2], centers=52, weights=a[,3]) It works well before adding in the weights, it also doesn't work with "weights=c(frequency 1, frequency 2, .)" and a few others I've tried. Maybe I don't know how to search the previous topics or the software help well enough yet, but I haven't come across an example that lays out weighting yet. Thank you in advance to anyone who has the answer. Jesse NOTICE - This communication is intended ONLY for the use of the person or entity named above and may contain information that is confidential or legally privileged. If you are not the intended recipient named above or a person responsible for delivering messages or communications to the intended recipient, YOU ARE HEREBY NOTIFIED that any use, distribution, or copying of this communication or any of the information contained in it is strictly prohibited. If you have received this communication in error, please notify us immediately by telephone and then destroy or delete this communication, or return it to us by mail if requested by us. The City of Calgary thanks you for your attention and co-operation. [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.