Hi! All, I am trying to fetch rows from a data frame which matches to first 2 columns of another data frame. Here is the example what I am trying to do:> ptable=read.table(file="All.txt",header=T,sep="\t") > ptable=as.matrix(ptable) > dim(ptable)[1] 9275 6> head(ptable)Gene1 Gene2 PCC PCC3 PCC23 PCC123 [1,] "3813_f_at" "3884_f_at" "0.9956842" "0.9955455" "0.9956513" "0.9956171" [2,] "3884_f_at" "3813_f_at" "0.9956842" "0.9955455" "0.9956513" "0.9956171" [3,] "3491_f_at" "3709_f_at" "0.9952116" "0.9951588" "0.9951601" "0.9950864" [4,] "3709_f_at" "3491_f_at" "0.9952116" "0.9951588" "0.9951601" "0.9950864" [5,] "3371_f_at" "3594_f_at" "0.9946206" "0.9945342" "0.9946246" "0.9946592" [6,] "3594_f_at" "3371_f_at" "0.9946206" "0.9945342" "0.9946246" "0.9946592"> table=read.table(file="All_GPYeast_m.txt",header=T,sep="\t") > table=as.matrix(table) > dim(table)[1] 9275 6> head(table)Gene1 Gene2 PCC PCC3 PCC23 PCC123 [1,] "3491_f_at" "3709_f_at" "0.9953142" "0.9950756" "0.9954676" "0.9952902" [2,] "3709_f_at" "3491_f_at" "0.9953142" "0.9950756" "0.9954676" "0.9952902" [3,] "3813_f_at" "3884_f_at" "0.9951781" "0.9953901" "0.9959256" "0.9958152" [4,] "3884_f_at" "3813_f_at" "0.9951781" "0.9953901" "0.9959256" "0.9958152" [5,] "3371_f_at" "3594_f_at" "0.9946130" "0.9938905" "0.9945572" "0.9945285" [6,] "3594_f_at" "3371_f_at" "0.9946130" "0.9938905" "0.9945572" "0.9945285" Now, I wish to pick column 1&2 from 'ptable' and their coresponding columns from 'table' and store it in a variable. I did following and got error> PCC=apply(ptable[,c(1,2)],1,function(x)table[x[1],x[2]])Error in FUN(newX[, i], ...) : subscript out of bounds I was expecting something like this>head(PCC)[1,] "3813_f_at" "3884_f_at" "0.9951781" "0.9953901" "0.9959256" "0.9958152" [2,] "3884_f_at" "3813_f_at" "0.9951781" "0.9953901" "0.9959256" "0.9958152" --------------------- --------------------- Please, help! regards Amit
Have you tried using merge? E.g., something like PCC <- merge(ptable[c("Gene1", "Gene2"),], table, suffices=c("","")) By the way, why do you convert the output of read.table to a matrix? Since you have both character and numeric data columns I think it would make more sense to leave the dataset as a data.frame (which read.table produces). Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Amit > Sent: Sunday, January 24, 2010 10:48 AM > To: r-help at r-project.org > Subject: [R] fetching columns from another file > > Hi! All, > > I am trying to fetch rows from a data frame which matches to first 2 > columns of another data frame. Here is the example what I am trying to > do: > > ptable=read.table(file="All.txt",header=T,sep="\t") > > ptable=as.matrix(ptable) > > dim(ptable) > [1] 9275 6 > > head(ptable) > Gene1 Gene2 PCC PCC3 PCC23 > PCC123 > [1,] "3813_f_at" "3884_f_at" "0.9956842" "0.9955455" > "0.9956513" "0.9956171" > [2,] "3884_f_at" "3813_f_at" "0.9956842" "0.9955455" > "0.9956513" "0.9956171" > [3,] "3491_f_at" "3709_f_at" "0.9952116" "0.9951588" > "0.9951601" "0.9950864" > [4,] "3709_f_at" "3491_f_at" "0.9952116" "0.9951588" > "0.9951601" "0.9950864" > [5,] "3371_f_at" "3594_f_at" "0.9946206" "0.9945342" > "0.9946246" "0.9946592" > [6,] "3594_f_at" "3371_f_at" "0.9946206" "0.9945342" > "0.9946246" "0.9946592" > > table=read.table(file="All_GPYeast_m.txt",header=T,sep="\t") > > table=as.matrix(table) > > dim(table) > [1] 9275 6 > > head(table) > Gene1 Gene2 PCC PCC3 PCC23 > PCC123 > [1,] "3491_f_at" "3709_f_at" "0.9953142" "0.9950756" > "0.9954676" "0.9952902" > [2,] "3709_f_at" "3491_f_at" "0.9953142" "0.9950756" > "0.9954676" "0.9952902" > [3,] "3813_f_at" "3884_f_at" "0.9951781" "0.9953901" > "0.9959256" "0.9958152" > [4,] "3884_f_at" "3813_f_at" "0.9951781" "0.9953901" > "0.9959256" "0.9958152" > [5,] "3371_f_at" "3594_f_at" "0.9946130" "0.9938905" > "0.9945572" "0.9945285" > [6,] "3594_f_at" "3371_f_at" "0.9946130" "0.9938905" > "0.9945572" "0.9945285" > > Now, I wish to pick column 1&2 from 'ptable' and their coresponding > columns from 'table' and store it in a variable. I did following and > got error > > PCC=apply(ptable[,c(1,2)],1,function(x)table[x[1],x[2]]) > Error in FUN(newX[, i], ...) : subscript out of bounds > > I was expecting something like this > >head(PCC) > [1,] "3813_f_at" "3884_f_at" "0.9951781" "0.9953901" > "0.9959256" "0.9958152" > [2,] "3884_f_at" "3813_f_at" "0.9951781" "0.9953901" > "0.9959256" "0.9958152" > --------------------- > --------------------- > > Please, help! > > regards > Amit > > ______________________________________________ > R-help at r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hi: The use of %in% may be more what you want. Consider the following (data faked because your data set could not be conveniently read into R...and Bill Dunlap is right, why would you want these to be matrices? But I digress...): # Generate gene names gene1 <- paste(c(3884, 3491, 3709, 3371, 3594), "_f_at", sep = '') gene2 <- paste(c(3813, 3709, 3491, 3594, 3371), "_f_at", sep = '') # Provide some 'data' that has some verisimilitude to yours... xyz <- matrix(round(0.995 + runif(20, -0.001, 0.001), 7), nrow = 5) ds1 <- data.frame(gene1 = gene1, gene2 = gene2, xyz) names(ds1)[3:6] <- c('PCC', 'PCC3', 'PCC23', 'PCC123') ds1 # gene1 gene2 PCC PCC3 PCC23 PCC123 # 1 3884_f_at 3813_f_at 0.9946656 0.9952888 0.9952047 0.9942946 # 2 3491_f_at 3709_f_at 0.9944245 0.9940811 0.9945579 0.9947879 # 3 3709_f_at 3491_f_at 0.9947104 0.9953734 0.9952654 0.9958892 # 4 3371_f_at 3594_f_at 0.9957142 0.9953967 0.9952465 0.9940917 # 5 3594_f_at 3371_f_at 0.9959341 0.9955436 0.9951310 0.9959632 # your results may vary... # Do it again for the second data set... gene1 <- paste(c(3491, 3709, 3813, 3884, 3371, 3594), '_f_at', sep = '') gene2 <- paste(c(3709, 3491, 3884, 3813, 3594, 3371), '_f_at', sep = '') uvw <- matrix(round(0.995 + runif(24, -0.001, 0.001), 7), nrow = 6) ds2 <- data.frame(gene1 = gene1, gene2 = gene2, uvw) names(ds2)[3:6] <- c('PCC', 'PCC3', 'PCC23', 'PCC123') ds2 # gene1 gene2 X1 X2 X3 X4 # 1 3491_f_at 3709_f_at 0.9954095 0.9955172 0.9957099 0.9953413 # 2 3709_f_at 3491_f_at 0.9943942 0.9940585 0.9949269 0.9941129 # 3 3813_f_at 3884_f_at 0.9956359 0.9948163 0.9949249 0.9954146 # 4 3884_f_at 3813_f_at 0.9941586 0.9952573 0.9946768 0.9957190 # 5 3371_f_at 3594_f_at 0.9950113 0.9950989 0.9941181 0.9942739 # 6 3594_f_at 3371_f_at 0.9956282 0.9942188 0.9948331 0.9957369 # Pick out the genes you want to match in the second data frame: mygenes <- ds1[1:2, 1:2] mygenes # gene1 gene2 # 1 3884_f_at 3813_f_at # 2 3491_f_at 3709_f_at # These should match to rows 4 and 1 of ds2. # Do the select... genesub <- with(ds2, ds2[gene1 %in% mygenes$gene1 & gene2 %in% mygenes$gene2, 3:6]) # Answer: genesub> genesubgene1 gene2 PCC PCC3 PCC23 PCC123 1 3491_f_at 3709_f_at 0.9954095 0.9955172 0.9957099 0.9953413 4 3884_f_at 3813_f_at 0.9941586 0.9952573 0.9946768 0.9957190 HTH, Dennis On Sun, Jan 24, 2010 at 10:47 AM, Amit <amitkumartiwary@gmail.com> wrote:> Hi! All, > > I am trying to fetch rows from a data frame which matches to first 2 > columns of another data frame. Here is the example what I am trying to > do: > > ptable=read.table(file="All.txt",header=T,sep="\t") > > ptable=as.matrix(ptable) > > dim(ptable) > [1] 9275 6 > > head(ptable) > Gene1 Gene2 PCC PCC3 PCC23 PCC123 > [1,] "3813_f_at" "3884_f_at" "0.9956842" "0.9955455" "0.9956513" > "0.9956171" > [2,] "3884_f_at" "3813_f_at" "0.9956842" "0.9955455" "0.9956513" > "0.9956171" > [3,] "3491_f_at" "3709_f_at" "0.9952116" "0.9951588" "0.9951601" > "0.9950864" > [4,] "3709_f_at" "3491_f_at" "0.9952116" "0.9951588" "0.9951601" > "0.9950864" > [5,] "3371_f_at" "3594_f_at" "0.9946206" "0.9945342" "0.9946246" > "0.9946592" > [6,] "3594_f_at" "3371_f_at" "0.9946206" "0.9945342" "0.9946246" > "0.9946592" > > table=read.table(file="All_GPYeast_m.txt",header=T,sep="\t") > > table=as.matrix(table) > > dim(table) > [1] 9275 6 > > head(table) > Gene1 Gene2 PCC PCC3 PCC23 PCC123 > [1,] "3491_f_at" "3709_f_at" "0.9953142" "0.9950756" "0.9954676" > "0.9952902" > [2,] "3709_f_at" "3491_f_at" "0.9953142" "0.9950756" "0.9954676" > "0.9952902" > [3,] "3813_f_at" "3884_f_at" "0.9951781" "0.9953901" "0.9959256" > "0.9958152" > [4,] "3884_f_at" "3813_f_at" "0.9951781" "0.9953901" "0.9959256" > "0.9958152" > [5,] "3371_f_at" "3594_f_at" "0.9946130" "0.9938905" "0.9945572" > "0.9945285" > [6,] "3594_f_at" "3371_f_at" "0.9946130" "0.9938905" "0.9945572" > "0.9945285" > > Now, I wish to pick column 1&2 from 'ptable' and their coresponding > columns from 'table' and store it in a variable. I did following and > got error > > PCC=apply(ptable[,c(1,2)],1,function(x)table[x[1],x[2]]) > Error in FUN(newX[, i], ...) : subscript out of bounds > > I was expecting something like this > >head(PCC) > [1,] "3813_f_at" "3884_f_at" "0.9951781" "0.9953901" "0.9959256" > "0.9958152" > [2,] "3884_f_at" "3813_f_at" "0.9951781" "0.9953901" "0.9959256" > "0.9958152" > --------------------- > --------------------- > > Please, help! > > regards > Amit > > ______________________________________________ > R-help@r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]