hi: I have matrix with dimensions(200 X 20,000). I have another file, a tab-delim file where first column variables are row names and second column variables are column names. For instance:> tmatApple Orange Mango Grape Star A 0 0 0 0 0 O 0 0 0 0 0 M 0 0 0 0 0 G 0 0 0 0 0 S 0 0 0 0 0> tb # tab- delim file.V1 V2 1 Apple S 2 Apple A 3 Apple O 4 Orange A 5 Orange O 6 Orange S 7 Mango M 8 Mango A 9 Mango S I have to read each line of the 'tb' (tab delim file), take the first variable, check if matches any rowname of the matrix. Take the second variable of the row in and check if it matches any column name. If so, put 1 else leave it. The following is a small piece of code that, I felt is a solutions. However, since my original matrix and tab-delim file is very very huge, I am not sure if it is really doing the correct thing. Could any one please help me if I am doing this correct.> for(i in 1:length(tb[,1])){+ r = tb[i,1] + c = as.character(tb[i,2]) + tmat[rownames(tmat)==c,colnames(tmat)==r] <-1 + }> tmatApple Orange Mango Grape Star A 1 1 1 0 0 O 1 1 0 0 0 M 0 0 1 0 0 G 0 0 0 0 0 S 1 1 1 0 0 Thanks.
Srinivas Iyyer <srini_iyyer_bio at yahoo.com> writes:> hi: > > I have matrix with dimensions(200 X 20,000). I have > another file, a tab-delim file where first column > variables are row names and second column variables > are column names. > > > For instance: > > > tmat > Apple Orange Mango Grape Star > A 0 0 0 0 0 > O 0 0 0 0 0 > M 0 0 0 0 0 > G 0 0 0 0 0 > S 0 0 0 0 0 > > > > > tb # tab- delim file. > V1 V2 > 1 Apple S > 2 Apple A > 3 Apple O > 4 Orange A > 5 Orange O > 6 Orange S > 7 Mango M > 8 Mango A > 9 Mango S > > > I have to read each line of the 'tb' (tab delim file), > take the first variable, check if matches any rowname > of the matrix. Take the second variable of the row in > and check if it matches any column name. If so, put > 1 else leave it. > > > The following is a small piece of code that, I felt is > a solutions. However, since my original matrix and > tab-delim file is very very huge, I am not sure if it > is really doing the correct thing. Could any one > please help me if I am doing this correct. > > > > > for(i in 1:length(tb[,1])){ > + r = tb[i,1] > + c = as.character(tb[i,2]) > + tmat[rownames(tmat)==c,colnames(tmat)==r] <-1 > + }There are much faster ways. Try (untested) n1 <- match(tb$V1, rownames(tmat)) n2 <- match(tb$V2, colnames(tmat)) m <- unique(cbind(n1,n2)[complete.cases(n1,n2),]) tmat[m] <- 1 The unique() part may or may not be beneficial. -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
try something like: mat1 <- matrix(0, 26, 100, dimnames = list(letters, 1:100)) mat2 <- cbind(sample(letters, 10), sample(100, 10)) ########### mat1[cbind(match(mat2[, 1], rownames(mat1)), match(mat2[, 2], colnames(mat1)))] <- 1 I hope it helps. Best, Dimitris ---- Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: med.kuleuven.be/biostat student.kuleuven.be/~m0390867/dimitris.htm ----- Original Message ----- From: "Srinivas Iyyer" <srini_iyyer_bio at yahoo.com> To: <r-help at stat.math.ethz.ch> Sent: Thursday, July 06, 2006 2:18 PM Subject: [R] Comparing two matrices> hi: > > I have matrix with dimensions(200 X 20,000). I have > another file, a tab-delim file where first column > variables are row names and second column variables > are column names. > > > For instance: > >> tmat > Apple Orange Mango Grape Star > A 0 0 0 0 0 > O 0 0 0 0 0 > M 0 0 0 0 0 > G 0 0 0 0 0 > S 0 0 0 0 0 > > > >> tb # tab- delim file. > V1 V2 > 1 Apple S > 2 Apple A > 3 Apple O > 4 Orange A > 5 Orange O > 6 Orange S > 7 Mango M > 8 Mango A > 9 Mango S > > > I have to read each line of the 'tb' (tab delim file), > take the first variable, check if matches any rowname > of the matrix. Take the second variable of the row in > and check if it matches any column name. If so, put > 1 else leave it. > > > The following is a small piece of code that, I felt is > a solutions. However, since my original matrix and > tab-delim file is very very huge, I am not sure if it > is really doing the correct thing. Could any one > please help me if I am doing this correct. > > > >> for(i in 1:length(tb[,1])){ > + r = tb[i,1] > + c = as.character(tb[i,2]) > + tmat[rownames(tmat)==c,colnames(tmat)==r] <-1 > + } > > > >> tmat > Apple Orange Mango Grape Star > A 1 1 1 0 0 > O 1 1 0 0 0 > M 0 0 1 0 0 > G 0 0 0 0 0 > S 1 1 1 0 0 > > > > Thanks. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > R-project.org/posting-guide.html >Disclaimer: kuleuven.be/cwis/email_disclaimer.htm
On 7/6/2006 8:18 AM, Srinivas Iyyer wrote:> hi: > > I have matrix with dimensions(200 X 20,000). I have > another file, a tab-delim file where first column > variables are row names and second column variables > are column names. > > > For instance: > >> tmat > Apple Orange Mango Grape Star > A 0 0 0 0 0 > O 0 0 0 0 0 > M 0 0 0 0 0 > G 0 0 0 0 0 > S 0 0 0 0 0 > > > >> tb # tab- delim file. > V1 V2 > 1 Apple S > 2 Apple A > 3 Apple O > 4 Orange A > 5 Orange O > 6 Orange S > 7 Mango M > 8 Mango A > 9 Mango S > > > I have to read each line of the 'tb' (tab delim file), > take the first variable, check if matches any rowname > of the matrix. Take the second variable of the row in > and check if it matches any column name. If so, put > 1 else leave it. > > > The following is a small piece of code that, I felt is > a solutions. However, since my original matrix and > tab-delim file is very very huge, I am not sure if it > is really doing the correct thing. Could any one > please help me if I am doing this correct. > > > >> for(i in 1:length(tb[,1])){ > + r = tb[i,1] > + c = as.character(tb[i,2]) > + tmat[rownames(tmat)==c,colnames(tmat)==r] <-1 > + }I think that works, but it's not as fast as some other ways of doing the same thing. For example, table(tb) will give you a table of the counts of each pair of entries in tb. pmin(table(tb), 1) will set the maximum count to 1. An advantage of this approach is that it will show you if there are any entries in tb that aren't in your tmat (typos, etc.). A disadvantage is that if there are any missing categories (e.g. G, Grape, Star in your sample) they won't show up at all, and you may need some manipulations to get things to look exactly the way you asked. For example, > pmin(table(tb)) V2 V1 A M O S Apple 1 0 1 1 Mango 1 1 0 1 Orange 1 0 1 1 > pmin(table(tb[,2:1])) V1 V2 Apple Mango Orange A 1 1 1 M 0 1 0 O 1 0 1 S 1 1 1 Duncan Murdoch> > > >> tmat > Apple Orange Mango Grape Star > A 1 1 1 0 0 > O 1 1 0 0 0 > M 0 0 1 0 0 > G 0 0 0 0 0 > S 1 1 1 0 0 > > > > Thanks. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! R-project.org/posting-guide.html