I have output from a program which produces a distance matrix I want to read into a clustering program in R. The output is a .txt file and is 'almost' lower triangular in the sense that it is just the triangle below the diagonal. So for example a 4-by-4 distance matrix appears as, 1 2 3 4 5 6 i.e. it looks like a lower triangular of a 3-by3. I thought I might be able to use "diag" to add zeros but apparently not. It's a problem because my matrix is actually 1989-by-1989 not 4-by-4 I would not be at all surprised if the solution is obvious but I cannot quite see how to read this into R. Michael Anyadike-Danes Economic Research Institute of Northern Ireland [[alternative HTML version deleted]]
Michael Anyadike-Danes wrote:> I have output from a program which produces a distance matrix I want to > read into a clustering program in R. > > > > The output is a .txt file and is 'almost' lower triangular in the sense > that it is just the triangle below the diagonal. > > > > So for example a 4-by-4 distance matrix appears as, > > > > 1 > > 2 3 > > 4 5 6 > > > > i.e. it looks like a lower triangular of a 3-by3. > > > > I thought I might be able to use "diag" to add zeros but apparently not. > > > > It's a problem because my matrix is actually 1989-by-1989 not 4-by-4 > > > > I would not be at all surprised if the solution is obvious but I cannot > quite see how to read this into R.You can use scan to get the entries from the file in row order. Then create the matrix to hold the result and overwrite the elements in the upper triangle with scanned vector. (R stores matrices in column-major order so the row-major order from your file corresponds to the upper triangle, not the lower triangle). I'll leave it to you to work out the symmetrization operation. > file.show("/tmp/tri.dat") 1 2 3 4 5 6 7 8 9 10 > mm <- array(0, c(4,4)) > mm[upper.tri(mm, diag = TRUE)] <- scan("/tmp/tri.dat") Read 10 items > mm [,1] [,2] [,3] [,4] [1,] 1 2 4 7 [2,] 0 3 5 8 [3,] 0 0 6 9 [4,] 0 0 0 10 > (res <- mm + t(mm)) [,1] [,2] [,3] [,4] [1,] 2 2 4 7 [2,] 2 6 5 8 [3,] 4 5 12 9 [4,] 7 8 9 20 > diag(res) <- diag(res)/2 > res [,1] [,2] [,3] [,4] [1,] 1 2 4 7 [2,] 2 3 5 8 [3,] 4 5 6 9 [4,] 7 8 9 10
> x <- 1:6 # In real life x <- scan(<filename>).> m <- matrix(0,4,4) > m[row(m)<col(m)] <- x > m <- t(m) > m [,1] [,2] [,3] [,4] [1,] 0 0 0 0 [2,] 1 0 0 0 [3,] 2 3 0 0 [4,] 4 5 6 0 The fiddle with the transposing is needed because R puts data into matrices column-by-column, not row-by-row. cheers, Rolf Turner rolf at math.unb.ca Michael Anyadike-Danes wrote:> I have output from a program which produces a distance matrix I want > to read into a clustering program in R. > > The output is a .txt file and is 'almost' lower triangular in the > sense that it is just the triangle below the diagonal. > > So for example a 4-by-4 distance matrix appears as, > > 1 > > 2 3 > > 4 5 6 > > i.e. it looks like a lower triangular of a 3-by3. > > I thought I might be able to use "diag" to add zeros but apparently not. > > It's a problem because my matrix is actually 1989-by-1989 not 4-by-4 > > I would not be at all surprised if the solution is obvious but I cannot > quite see how to read this into R.