> On Jun 6, 2017, at 4:01 AM, Jim Lemon <drjimlemon at gmail.com> wrote: > > Hi Bogdan, > Kinda messy, but: > > N <- data.frame(N=c("n1","n2","n3","n4")) > M <- data.frame(M=c("m1","m2","m3","m4","m5")) > C <- data.frame(n=c("n1","n2","n3"), m=c("m1","m1","m3"), I=c(100,300,400)) > MN<-as.data.frame(matrix(NA,nrow=length(N[,1]),ncol=length(M[,1]))) > names(MN)<-M[,1] > rownames(MN)<-N[,1] > C[,1]<-as.character(C[,1]) > C[,2]<-as.character(C[,2]) > for(row in 1:dim(C)[1]) MN[C[row,1],C[row,2]]<-C[row,3]`xtabs` offers another route: C$m <- factor(C$m, levels=M$M) C$n <- factor(C$n, levels=N$N) Option 1: Zeroes in the empty positions:> (X <- xtabs(I ~ m+n , C, addNA=TRUE))n m n1 n2 n3 n4 m1 100 300 0 0 m2 0 0 0 0 m3 0 0 400 0 m4 0 0 0 0 m5 0 0 0 0 Option 2: Sparase matrix> (X <- xtabs(I ~ m+n , C, sparse=TRUE))5 x 4 sparse Matrix of class "dgCMatrix" n m n1 n2 n3 n4 m1 100 300 . . m2 . . . . m3 . . 400 . m4 . . . . m5 . . . . I wasn't sure if the sparse reuslts of xtabs would make a distinction between 0 and NA, but happily it does:> C <- data.frame(n=c("n1","n2","n3", "n3", "n4"), m=c("m1","m1","m3", "m4", "m5"), I=c(100,300,400, NA, 0)) > Cn m I 1 n1 m1 100 2 n2 m1 300 3 n3 m3 400 4 n3 m4 NA 5 n4 m5 0> (X <- xtabs(I ~ m+n , C, sparse=TRUE))4 x 4 sparse Matrix of class "dgCMatrix" n m n1 n2 n3 n4 m1 100 300 . . m3 . . 400 . m4 . . . . m5 . . . 0 (In the example I forgot to repeat the lines that augmented the factor levels so m2 is not seen. -- Davod> > > Jim > > On Tue, Jun 6, 2017 at 3:51 PM, Bogdan Tanasa <tanasa at gmail.com> wrote: >> Dear Bert, >> >> thank you for your response. here it is the piece of R code : given 3 data >> frames below --- >> >> N <- data.frame(N=c("n1","n2","n3","n4")) >> >> M <- data.frame(M=c("m1","m2","m3","m4","m5")) >> >> C <- data.frame(n=c("n1","n2","n3"), m=c("m1","m1","m3"), I=c(100,300,400)) >> >> how shall I integrate N, and M, and C in such a way that at the end we have >> a data frame with : >> >> >> - list N as the columns names >> - list M as the rows names >> - the values in the cells of N * M, corresponding to the numerical >> values in the data frame C. >> >> more precisely, the result shall be : >> >> n1 n2 n3 n4 >> m1 100 200 - - >> m2 - - - - >> m3 - - 300 - >> m4 - - - - >> m5 - - - - >> >> thank you ! >> >> >> On Mon, Jun 5, 2017 at 6:57 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote: >> >>> Reproducible example, please. -- In particular, what exactly does C look >>> ilike? >>> >>> (You should know this by now). >>> >>> -- Bert >>> Bert Gunter >>> >>> "The trouble with having an open mind is that people keep coming along >>> and sticking things into it." >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>> >>> >>> On Mon, Jun 5, 2017 at 6:45 PM, Bogdan Tanasa <tanasa at gmail.com> wrote: >>>> Dear all, >>>> >>>> please could you advise on the R code I could use in order to do the >>>> following operation : >>>> >>>> a. -- I have 2 lists of "genome coordinates" : a list is composed by >>>> numbers that represent genome coordinates; >>>> >>>> let's say list N : >>>> >>>> n1 >>>> >>>> n2 >>>> >>>> n3 >>>> >>>> n4 >>>> >>>> and a list M: >>>> >>>> m1 >>>> >>>> m2 >>>> >>>> m3 >>>> >>>> m4 >>>> >>>> m5 >>>> >>>> 2 -- and a data frame C, where for some pairs of coordinates (n,m) from >>> the >>>> lists above, we have a numerical intensity; >>>> >>>> for example : >>>> >>>> n1; m1; 100 >>>> >>>> n1; m2; 300 >>>> >>>> The question would be : what is the most efficient R code I could use in >>>> order to integrate the list N, the list M, and the data frame C, in order >>>> to obtain a DATA FRAME, >>>> >>>> -- list N as the columns names >>>> -- list M as the rows names >>>> -- the values in the cells of N * M, corresponding to the numerical >>> values >>>> in the data frame C. >>>> >>>> A little example would be : >>>> >>>> n1 n2 n3 n4 >>>> >>>> m1 100 - - - >>>> >>>> m2 300 - - - >>>> >>>> m3 - - - - >>>> >>>> m4 - - - - >>>> >>>> m5 - - - - >>>> I wrote a script in perl, although i would like to do this in R >>>> Many thanks ;) >>>> -- bogdan >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/ >>> posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
Thank you David. Using xtabs operation simplifies the code very much, many thanks ;) On Tue, Jun 6, 2017 at 7:44 AM, David Winsemius <dwinsemius at comcast.net> wrote:> > > On Jun 6, 2017, at 4:01 AM, Jim Lemon <drjimlemon at gmail.com> wrote: > > > > Hi Bogdan, > > Kinda messy, but: > > > > N <- data.frame(N=c("n1","n2","n3","n4")) > > M <- data.frame(M=c("m1","m2","m3","m4","m5")) > > C <- data.frame(n=c("n1","n2","n3"), m=c("m1","m1","m3"), > I=c(100,300,400)) > > MN<-as.data.frame(matrix(NA,nrow=length(N[,1]),ncol=length(M[,1]))) > > names(MN)<-M[,1] > > rownames(MN)<-N[,1] > > C[,1]<-as.character(C[,1]) > > C[,2]<-as.character(C[,2]) > > for(row in 1:dim(C)[1]) MN[C[row,1],C[row,2]]<-C[row,3] > > `xtabs` offers another route: > > C$m <- factor(C$m, levels=M$M) > C$n <- factor(C$n, levels=N$N) > > Option 1: Zeroes in the empty positions: > > (X <- xtabs(I ~ m+n , C, addNA=TRUE)) > n > m n1 n2 n3 n4 > m1 100 300 0 0 > m2 0 0 0 0 > m3 0 0 400 0 > m4 0 0 0 0 > m5 0 0 0 0 > > Option 2: Sparase matrix > > (X <- xtabs(I ~ m+n , C, sparse=TRUE)) > 5 x 4 sparse Matrix of class "dgCMatrix" > n > m n1 n2 n3 n4 > m1 100 300 . . > m2 . . . . > m3 . . 400 . > m4 . . . . > m5 . . . . > > I wasn't sure if the sparse reuslts of xtabs would make a distinction > between 0 and NA, but happily it does: > > > C <- data.frame(n=c("n1","n2","n3", "n3", "n4"), m=c("m1","m1","m3", > "m4", "m5"), I=c(100,300,400, NA, 0)) > > C > n m I > 1 n1 m1 100 > 2 n2 m1 300 > 3 n3 m3 400 > 4 n3 m4 NA > 5 n4 m5 0 > > (X <- xtabs(I ~ m+n , C, sparse=TRUE)) > 4 x 4 sparse Matrix of class "dgCMatrix" > n > m n1 n2 n3 n4 > m1 100 300 . . > m3 . . 400 . > m4 . . . . > m5 . . . 0 > > (In the example I forgot to repeat the lines that augmented the factor > levels so m2 is not seen. > > -- > Davod > > > > > > Jim > > > > On Tue, Jun 6, 2017 at 3:51 PM, Bogdan Tanasa <tanasa at gmail.com> wrote: > >> Dear Bert, > >> > >> thank you for your response. here it is the piece of R code : given 3 > data > >> frames below --- > >> > >> N <- data.frame(N=c("n1","n2","n3","n4")) > >> > >> M <- data.frame(M=c("m1","m2","m3","m4","m5")) > >> > >> C <- data.frame(n=c("n1","n2","n3"), m=c("m1","m1","m3"), > I=c(100,300,400)) > >> > >> how shall I integrate N, and M, and C in such a way that at the end we > have > >> a data frame with : > >> > >> > >> - list N as the columns names > >> - list M as the rows names > >> - the values in the cells of N * M, corresponding to the numerical > >> values in the data frame C. > >> > >> more precisely, the result shall be : > >> > >> n1 n2 n3 n4 > >> m1 100 200 - - > >> m2 - - - - > >> m3 - - 300 - > >> m4 - - - - > >> m5 - - - - > >> > >> thank you ! > >> > >> > >> On Mon, Jun 5, 2017 at 6:57 PM, Bert Gunter <bgunter.4567 at gmail.com> > wrote: > >> > >>> Reproducible example, please. -- In particular, what exactly does C > look > >>> ilike? > >>> > >>> (You should know this by now). > >>> > >>> -- Bert > >>> Bert Gunter > >>> > >>> "The trouble with having an open mind is that people keep coming along > >>> and sticking things into it." > >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >>> > >>> > >>> On Mon, Jun 5, 2017 at 6:45 PM, Bogdan Tanasa <tanasa at gmail.com> > wrote: > >>>> Dear all, > >>>> > >>>> please could you advise on the R code I could use in order to do the > >>>> following operation : > >>>> > >>>> a. -- I have 2 lists of "genome coordinates" : a list is composed by > >>>> numbers that represent genome coordinates; > >>>> > >>>> let's say list N : > >>>> > >>>> n1 > >>>> > >>>> n2 > >>>> > >>>> n3 > >>>> > >>>> n4 > >>>> > >>>> and a list M: > >>>> > >>>> m1 > >>>> > >>>> m2 > >>>> > >>>> m3 > >>>> > >>>> m4 > >>>> > >>>> m5 > >>>> > >>>> 2 -- and a data frame C, where for some pairs of coordinates (n,m) > from > >>> the > >>>> lists above, we have a numerical intensity; > >>>> > >>>> for example : > >>>> > >>>> n1; m1; 100 > >>>> > >>>> n1; m2; 300 > >>>> > >>>> The question would be : what is the most efficient R code I could use > in > >>>> order to integrate the list N, the list M, and the data frame C, in > order > >>>> to obtain a DATA FRAME, > >>>> > >>>> -- list N as the columns names > >>>> -- list M as the rows names > >>>> -- the values in the cells of N * M, corresponding to the numerical > >>> values > >>>> in the data frame C. > >>>> > >>>> A little example would be : > >>>> > >>>> n1 n2 n3 n4 > >>>> > >>>> m1 100 - - - > >>>> > >>>> m2 300 - - - > >>>> > >>>> m3 - - - - > >>>> > >>>> m4 - - - - > >>>> > >>>> m5 - - - - > >>>> I wrote a script in perl, although i would like to do this in R > >>>> Many thanks ;) > >>>> -- bogdan > >>>> > >>>> [[alternative HTML version deleted]] > >>>> > >>>> ______________________________________________ > >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>> PLEASE do read the posting guide http://www.R-project.org/ > >>> posting-guide.html > >>>> and provide commented, minimal, self-contained, reproducible code. > >>> > >> > >> [[alternative HTML version deleted]] > >> > >> ______________________________________________ > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > >[[alternative HTML version deleted]]
Simple matrix indexing suffices without any fancier functionality. ## First convert M and N to character vectors -- which they should have been in the first place! M <- sort(as.character(M[,1])) N <- sort(as.character(N[,1])) ## This could be a one-liner, but I'll split it up for clarity. res <-matrix(NA, length(M),length(N),dimnames = list(M,N)) res[as.matrix(C[,2:1])] <- C$I ## matrix indexing res Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Jun 6, 2017 at 7:46 AM, Bogdan Tanasa <tanasa at gmail.com> wrote:> Thank you David. Using xtabs operation simplifies the code very much, many > thanks ;) > > On Tue, Jun 6, 2017 at 7:44 AM, David Winsemius <dwinsemius at comcast.net> > wrote: > >> >> > On Jun 6, 2017, at 4:01 AM, Jim Lemon <drjimlemon at gmail.com> wrote: >> > >> > Hi Bogdan, >> > Kinda messy, but: >> > >> > N <- data.frame(N=c("n1","n2","n3","n4")) >> > M <- data.frame(M=c("m1","m2","m3","m4","m5")) >> > C <- data.frame(n=c("n1","n2","n3"), m=c("m1","m1","m3"), >> I=c(100,300,400)) >> > MN<-as.data.frame(matrix(NA,nrow=length(N[,1]),ncol=length(M[,1]))) >> > names(MN)<-M[,1] >> > rownames(MN)<-N[,1] >> > C[,1]<-as.character(C[,1]) >> > C[,2]<-as.character(C[,2]) >> > for(row in 1:dim(C)[1]) MN[C[row,1],C[row,2]]<-C[row,3] >> >> `xtabs` offers another route: >> >> C$m <- factor(C$m, levels=M$M) >> C$n <- factor(C$n, levels=N$N) >> >> Option 1: Zeroes in the empty positions: >> > (X <- xtabs(I ~ m+n , C, addNA=TRUE)) >> n >> m n1 n2 n3 n4 >> m1 100 300 0 0 >> m2 0 0 0 0 >> m3 0 0 400 0 >> m4 0 0 0 0 >> m5 0 0 0 0 >> >> Option 2: Sparase matrix >> > (X <- xtabs(I ~ m+n , C, sparse=TRUE)) >> 5 x 4 sparse Matrix of class "dgCMatrix" >> n >> m n1 n2 n3 n4 >> m1 100 300 . . >> m2 . . . . >> m3 . . 400 . >> m4 . . . . >> m5 . . . . >> >> I wasn't sure if the sparse reuslts of xtabs would make a distinction >> between 0 and NA, but happily it does: >> >> > C <- data.frame(n=c("n1","n2","n3", "n3", "n4"), m=c("m1","m1","m3", >> "m4", "m5"), I=c(100,300,400, NA, 0)) >> > C >> n m I >> 1 n1 m1 100 >> 2 n2 m1 300 >> 3 n3 m3 400 >> 4 n3 m4 NA >> 5 n4 m5 0 >> > (X <- xtabs(I ~ m+n , C, sparse=TRUE)) >> 4 x 4 sparse Matrix of class "dgCMatrix" >> n >> m n1 n2 n3 n4 >> m1 100 300 . . >> m3 . . 400 . >> m4 . . . . >> m5 . . . 0 >> >> (In the example I forgot to repeat the lines that augmented the factor >> levels so m2 is not seen. >> >> -- >> Davod >> > >> > >> > Jim >> > >> > On Tue, Jun 6, 2017 at 3:51 PM, Bogdan Tanasa <tanasa at gmail.com> wrote: >> >> Dear Bert, >> >> >> >> thank you for your response. here it is the piece of R code : given 3 >> data >> >> frames below --- >> >> >> >> N <- data.frame(N=c("n1","n2","n3","n4")) >> >> >> >> M <- data.frame(M=c("m1","m2","m3","m4","m5")) >> >> >> >> C <- data.frame(n=c("n1","n2","n3"), m=c("m1","m1","m3"), >> I=c(100,300,400)) >> >> >> >> how shall I integrate N, and M, and C in such a way that at the end we >> have >> >> a data frame with : >> >> >> >> >> >> - list N as the columns names >> >> - list M as the rows names >> >> - the values in the cells of N * M, corresponding to the numerical >> >> values in the data frame C. >> >> >> >> more precisely, the result shall be : >> >> >> >> n1 n2 n3 n4 >> >> m1 100 200 - - >> >> m2 - - - - >> >> m3 - - 300 - >> >> m4 - - - - >> >> m5 - - - - >> >> >> >> thank you ! >> >> >> >> >> >> On Mon, Jun 5, 2017 at 6:57 PM, Bert Gunter <bgunter.4567 at gmail.com> >> wrote: >> >> >> >>> Reproducible example, please. -- In particular, what exactly does C >> look >> >>> ilike? >> >>> >> >>> (You should know this by now). >> >>> >> >>> -- Bert >> >>> Bert Gunter >> >>> >> >>> "The trouble with having an open mind is that people keep coming along >> >>> and sticking things into it." >> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >>> >> >>> >> >>> On Mon, Jun 5, 2017 at 6:45 PM, Bogdan Tanasa <tanasa at gmail.com> >> wrote: >> >>>> Dear all, >> >>>> >> >>>> please could you advise on the R code I could use in order to do the >> >>>> following operation : >> >>>> >> >>>> a. -- I have 2 lists of "genome coordinates" : a list is composed by >> >>>> numbers that represent genome coordinates; >> >>>> >> >>>> let's say list N : >> >>>> >> >>>> n1 >> >>>> >> >>>> n2 >> >>>> >> >>>> n3 >> >>>> >> >>>> n4 >> >>>> >> >>>> and a list M: >> >>>> >> >>>> m1 >> >>>> >> >>>> m2 >> >>>> >> >>>> m3 >> >>>> >> >>>> m4 >> >>>> >> >>>> m5 >> >>>> >> >>>> 2 -- and a data frame C, where for some pairs of coordinates (n,m) >> from >> >>> the >> >>>> lists above, we have a numerical intensity; >> >>>> >> >>>> for example : >> >>>> >> >>>> n1; m1; 100 >> >>>> >> >>>> n1; m2; 300 >> >>>> >> >>>> The question would be : what is the most efficient R code I could use >> in >> >>>> order to integrate the list N, the list M, and the data frame C, in >> order >> >>>> to obtain a DATA FRAME, >> >>>> >> >>>> -- list N as the columns names >> >>>> -- list M as the rows names >> >>>> -- the values in the cells of N * M, corresponding to the numerical >> >>> values >> >>>> in the data frame C. >> >>>> >> >>>> A little example would be : >> >>>> >> >>>> n1 n2 n3 n4 >> >>>> >> >>>> m1 100 - - - >> >>>> >> >>>> m2 300 - - - >> >>>> >> >>>> m3 - - - - >> >>>> >> >>>> m4 - - - - >> >>>> >> >>>> m5 - - - - >> >>>> I wrote a script in perl, although i would like to do this in R >> >>>> Many thanks ;) >> >>>> -- bogdan >> >>>> >> >>>> [[alternative HTML version deleted]] >> >>>> >> >>>> ______________________________________________ >> >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >>>> https://stat.ethz.ch/mailman/listinfo/r-help >> >>>> PLEASE do read the posting guide http://www.R-project.org/ >> >>> posting-guide.html >> >>>> and provide commented, minimal, self-contained, reproducible code. >> >>> >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> ______________________________________________ >> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide http://www.R-project.org/ >> posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide http://www.R-project.org/ >> posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> David Winsemius >> Alameda, CA, USA >> >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.