So you just want to compare the distances from each point of your new
data to each of the Centres and assign the corresponding number of the
centre as in:
clust <- apply(NewData, 1, function(x) which.min(colSums(x - tCentre)^2))))
but since the apply loop is rather long here for lots of new data, one
may want to optimize the runtime for huge data and get:
tNewData <- t(NewData)
clust <- max.col(-apply(Centre, 1, function(x) colSums((x - tNewData)^2)))
Best,
Uwe Ligges
On 21.05.2013 13:19, HJ YAN wrote:> Dear R users
>
>
> I have the matrix of the centres of some clusters, e.g. 20 clusters each
> with 100 dimentions, so this matrix contains 20 rows * 100 columns numeric
> values.
>
> I have collected new data (each with 100 numeric values) and would like to
> keep the above 20 centres fixed/'unmoved' whilst just see how my
new data
> fit in this grouping system, e.g. if the data is close to cluster 1 than
> lable it 'cluster 1'.
>
> If the above matrix of centre is called 'Centre' (a 20*100 matrix)
and my
> new data 'NewData' has 500 observations, by using kmeans() will
update the
> centres:
>
> kmeans(NewData, Centre)
>
>
> I wondered if there is other R packages out there can keep the centres
> fixed and lable each observations of my new data? Or I have to write my own
> function?
>
> To illustrate my task using a simpler example:
>
> I have
>
> Centre<- matrix(c(0,1,0,1), nrow=2)
>
> # the two created centres in a two dimentional case are
> Centre
> [,1] [,2]
> [1,] 0 0
> [2,] 1 1
>
> NewData<-rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
> matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
>
> NewData1<-cbind(c1:100), NewData)
> colnames(NewData1)<-c("ID","x","y")
>
> # my data
> head(NewData1)
> ID x y
> [1,] 1 -0.3974660 0.1541685
> [2,] 2 0.5321347 0.2497867
> [3,] 3 0.2550276 0.1691720
> [4,] 4 -0.1162162 0.6754874
> [5,] 5 0.1570996 0.1175119
> [6,] 6 0.4816195 -0.6836226
>
> ## I'd like to have outcome as below (whilst keep the tow centers
fixed):
>
> ID x y Cluster
> [1,] 1 -0.3974660 0.1541685 1
> [2,] 2 0.5321347 0.2497867 1
> [3,] 3 0.2550276 0.1691720 1
> [4,] 4 -0.1162162 0.6754874 1
>
> ...
> [55,] 55 1.1570996 1.1175119 2
> [56,] 56 1.4816195 1.6836226 2
>
>
> p.s. I use Euclidian to obtain/calculate distance matrix.
>
>
> Many thanks in advance
>
> HJ
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>