Hello, I have a binary matrix of 80k sets (sets comprising of combination of cities) by 885 cities (dimension = 80k x 885). For matrix, 1 means city is a part of the set and 0 means the city is not part of the set. Sets are rows and cities are columns (city.test). I want to do feature reduction to only keep important sets (most likely 2-10 sets of city combinations) and the associated cities. So I chose SVD and I am following these steps but not sure how to go about the next step. Could anyone help with this? s <- svd(city.test) D <- diag(s$d) d2 <- (s$d)^2 ratio <- cumsum(d2/dum(d2)) # proportion of total variance from 885 PCs. and looking at the plots, I see about first ~10 or 20 PCs explain the most variation (Please see attatched plot). How do I use this to extract the most relevant sets from my original matrix? COuld you please help. A friend of mine recommended plotting: rowSums(abs(s$u*s$d)) and choosing only the highest magnitude sets. I didn't understand the significance of it. Most probably, it reflects that only the first PC contributes the most, hence we only care about rowsum(abs(u*d)). Is this correct? Thanks. -------------- next part -------------- A non-text attachment was scrubbed... Name: variance-cities.pdf Type: application/pdf Size: 24376 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120607/78e1ffca/attachment.pdf>