Hi, I have a comma separated file with element names in first column like shown below : Name_1,0 Name_2,0.8878,0 Name_3,0.6777,0.7643,0 Name_4,0.9844,0.1234,0.1414,0 Original data is a 10000x10000 symmetric matrix (600 MB). To reduce file size, I have minimized matrix to only lower triangle. Is there a (memory) efficient way to 1) read file 2) compute first and second principal components and 3) and plot first vs second PC's ? In the past, I could do this by : b <- read.csv("distance.csv", sep=",", head=F) # distance.csv file is complete data matrix, so this command worked !! my_matrix <- data.matrix(b) pca2 <- princomp(my_matrix) plot(pca2$scores[,1],pca2$scores[,2]) text(pca2$scores[,1],pca2$scores[,2],rownames(nba_matrix), cex=0.5, pos=1) This time, I don't have a complete file. So, I was wondering, how to do this ? Any help is much appreciated TIA M -- View this message in context: http://r.789695.n4.nabble.com/Principal-componet-plot-from-lower-triangular-matrix-file-tp4114840p4114840.html Sent from the R help mailing list archive at Nabble.com.
R distance objects are triangular, maybe consider as.dist() that would require the square matrix as input. Which could be reconstructed(or you have it already.) I do not know if there is a biglm() alternative to princomp(), but maybe consider using subsets of your data because that plot, if created, is going to be very hectic. HTH Ken Hutchison On Nov 28, 2554 BE, at 5:55 AM, cm <mbnchakravarthy at gmail.com> wrote:> Hi, > > I have a comma separated file with element names in first column like shown > below : > > Name_1,0 > Name_2,0.8878,0 > Name_3,0.6777,0.7643,0 > Name_4,0.9844,0.1234,0.1414,0 > > Original data is a 10000x10000 symmetric matrix (600 MB). To reduce file > size, I have minimized matrix to only lower triangle. Is there a (memory) > efficient way to 1) read file 2) compute first and second principal > components and 3) and plot first vs second PC's ? > > In the past, I could do this by : > b <- read.csv("distance.csv", sep=",", head=F) # distance.csv file is > complete data matrix, so this command worked !! > my_matrix <- data.matrix(b) > pca2 <- princomp(my_matrix) > plot(pca2$scores[,1],pca2$scores[,2]) > text(pca2$scores[,1],pca2$scores[,2],rownames(nba_matrix), cex=0.5, pos=1) > > This time, I don't have a complete file. So, I was wondering, how to do this > ? > > Any help is much appreciated > > TIA > M > > -- > View this message in context: http://r.789695.n4.nabble.com/Principal-componet-plot-from-lower-triangular-matrix-file-tp4114840p4114840.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
marella
2011-Nov-28 14:21 UTC
[R] Principal componet plot from lower triangular matrix file
Yes. I agree that plot is going to be crowded. But idea is to see if elements of same type (different color code etc) group together or not. I would need only first two principal components (at most three). Since princomp calculates all components, it is taking very long time ! -- View this message in context: http://r.789695.n4.nabble.com/Principal-componet-plot-from-lower-triangular-matrix-file-tp4114840p4115339.html Sent from the R help mailing list archive at Nabble.com.
Possibly Parallel Threads
- What ruby/rails componet do I need?
- Análisis de componentes principales con ade4 y FactoMineR
- Análisis de componentes principales con ade4 y FactoMineR
- Análisis de componentes principales con ade4 y FactoMineR
- Análisis de componentes principales con ade4 y FactoMineR