Thanks Ben. I need to learn more about apply. Have you a link or tutorial about apply. R documentation is very short. How can obtain: z <- list (Col1, Col2, Col3, Col4......)? Thanks ?__ c/ /'_;~~~~kmezhoud (*) \(*) ????? ?????? http://bioinformatics.tn/ On Mon, Jan 19, 2015 at 8:22 PM, Ben Tupper <btupper at bigelow.org> wrote:> Hi again, > > On Jan 19, 2015, at 1:53 PM, Karim Mezhoud <kmezhoud at gmail.com> wrote: > > Yes Many thanks. > That is my request using lapply. > > do.call(cbind,col1) > > converts col1 to matrix but does not fill empty value with NA. > > Even for > > matrix(unlist(col1), ncol=5,byrow = FALSE) > > > How can get Matrix class of col1? And fill empty values with NA? > > > Perhaps best is to determine the maximum number of rows required first, > then force each subset to have that length. > > # make a list of matrices, each with nCol columns and differing > # number of rows > nCol <- 3 > nRow <- sample(3:10, 5) > x <- lapply(nRow, function(x, nc) {matrix(x:(x + nc*x - 1), ncol = nc, > nrow = x)}, nCol) > x > > # make a simple function to get a single column from a matrix > getColumn <- function(x, colNum, len = nrow(x)) { > y <- x[,colNum] > length(y) <- len > y > } > > # what is the maximum number of rows > n <- max(sapply(x, nrow)) > > # use the function to get the column from each matrix > col1 <- lapply(x, getColumn, 1, len = n) > col1 > > do.call(cbind, col1) > [,1] [,2] [,3] [,4] [,5] > [1,] 3 8 5 7 9 > [2,] 4 9 6 8 10 > [3,] 5 10 7 9 11 > [4,] NA 11 8 10 12 > [5,] NA 12 9 11 13 > [6,] NA 13 NA 12 14 > [7,] NA 14 NA 13 15 > [8,] NA 15 NA NA 16 > [9,] NA NA NA NA 17 > > Ben > > Thanks > Karim > > > ?__ > c/ /'_;~~~~kmezhoud > (*) \(*) ????? ?????? > http://bioinformatics.tn/ > > > > On Mon, Jan 19, 2015 at 4:36 PM, Ben Tupper <ben.bighair at gmail.com> wrote: > >> Hi, >> >> On Jan 18, 2015, at 4:36 PM, Karim Mezhoud <kmezhoud at gmail.com> wrote: >> >> > Dear All, >> > I am trying to get correlation between Diseases (80) in columns and >> > samples in rows (UNEQUAL) using gene expression (at less 1000,numeric). >> For >> > this I can use CORREP package with cor.unbalanced function. >> > >> > But before to get this final matrix I need to load and to store the >> > expression of 1000 genes for every Disease (80). Every disease has >> > different number of samples (between 50 - 500). >> > >> > It is possible to get a cube of matrices with equal columns but unequal >> > rows? I think NO and I can't use array function. >> > >> > I am trying to get ? list of matrices having the same number of columns >> but >> > different number of rows. as >> > >> > Cubist <- vector("list", 1) >> > Cubist$Expression <- vector("list", 1) >> > >> > >> > for (i in 1:80){ >> > >> > matrix <- function(getGeneExpression[i]) >> > Cubist$Expression[[Disease[i]]] <- matrix >> > >> > } >> > >> > At this step I have: >> > length(Cubist$Expression) >> > #80 >> > dim(Cubist$Expression$Disease1) >> > #526 1000 >> > dim(Cubist$Expression$Disease2) >> > #106 1000 >> > >> > names(Cubist$Expression$Disease1[4]) >> > #ABD >> > >> > names(Cubist$Expression$Disease2[4]) >> > #ABD >> > >> > Now I need to built the final matrices for every genes (1000) that I >> will >> > use for CORREP function. >> > >> > Is there a way to extract directly the first column (first gene) for all >> > Diseases (80) from Cubist$Expression? or >> > >> >> I don't understand most your question, but the above seems to be straight >> forward. Here's a toy example: >> >> # make a list of matrices, each with nCol columns and differing >> # number of rows, nRow >> nCol <- 3 >> nRow <- sample(3:10, 5) >> x <- lapply(nRow, function(x, nc) {matrix(x:(x + nc*x - 1), ncol = nc, >> nrow = x)}, nCol) >> x >> >> # make a simple function to get a single column from a matrix >> getColumn <- function(x, colNum) { >> return(x[,colNum]) >> } >> >> # use the function to get the column from each matrix >> col1 <- lapply(x, getColumn, 1) >> col1 >> >> Does that help answer this part of your question? If not, you may need >> to create a very small example of your data and post it here using the >> head() and dput() functions. >> >> Ben >> >> >> >> > I need to built 1000 matrices with 80 columns and unequal rows? >> > >> > Cublist$Diseases <- vector("list", 1) >> > >> > for (k in 1:1000){ >> > for (i in 1:80){ >> > >> > Cublist$Diseases[[gene[k] ]] <- Cubist$Expression[[Diseases[i] ]][k] >> > } >> > >> > } >> > >> > This double loops is time consuming...Is there a way to do this faster? >> > >> > Thanks, >> > karim >> > ?__ >> > c/ /'_;~~~~kmezhoud >> > (*) \(*) ????? ?????? >> > http://bioinformatics.tn/ >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> <http://www.r-project.org/posting-guide.html> >> > and provide commented, minimal, self-contained, reproducible code. >> >> > > Ben Tupper > Bigelow Laboratory for Ocean Sciences > 60 Bigelow Drive, P.O. Box 380 > East Boothbay, Maine 04544 > http://www.bigelow.org > > > > > > > > >[[alternative HTML version deleted]]
Hi, On Jan 19, 2015, at 5:17 PM, Karim Mezhoud <kmezhoud at gmail.com> wrote:> Thanks Ben. > I need to learn more about apply. Have you a link or tutorial about apply. R documentation is very short. > > How can obtain: > z <- list (Col1, Col2, Col3, Col4......)? >This may not be the most efficient way and there certainly is no error checking, but you can wrap one lapply within another as shown below. The innermost iterates over your list of input matrices, extracting one column specified per list element. The outer lapply iterates over the various column numbers you want to extract. getMatrices <- function(colNums, dataList = x){ # the number of rows required n <- max(sapply(dataList, nrow)) lapply(colNums, function(x, dat, n) { # iterate along requested columns do.call(cbind, lapply(dat, getColumn,x, len=n)) # iterate along input data list }, dataList, n) } getMatrices(c(1,3), dataList = x) If we are lucky, one of the plyr package users might show us how to do the same with a one-liner. There are endless resources online, here are some gems. http://www.r-project.org/doc/bib/R-books.html http://www.rseek.org/ http://www.burns-stat.com/documents/ http://www.r-bloggers.com/ Also, I found "Data Manipulation with R" ( http://www.r-project.org/doc/bib/R-books_bib.html#R:Spector:2008 ) helpful. Ben> Thanks > > ?__ > c/ /'_;~~~~kmezhoud > (*) \(*) ????? ?????? > http://bioinformatics.tn/ > > > > On Mon, Jan 19, 2015 at 8:22 PM, Ben Tupper <btupper at bigelow.org> wrote: > Hi again, > > On Jan 19, 2015, at 1:53 PM, Karim Mezhoud <kmezhoud at gmail.com> wrote: > >> Yes Many thanks. >> That is my request using lapply. >> >> do.call(cbind,col1) >> >> converts col1 to matrix but does not fill empty value with NA. >> >> Even for >> >> matrix(unlist(col1), ncol=5,byrow = FALSE) >> >> >> How can get Matrix class of col1? And fill empty values with NA? >> > > Perhaps best is to determine the maximum number of rows required first, then force each subset to have that length. > > # make a list of matrices, each with nCol columns and differing > # number of rows > nCol <- 3 > nRow <- sample(3:10, 5) > x <- lapply(nRow, function(x, nc) {matrix(x:(x + nc*x - 1), ncol = nc, nrow = x)}, nCol) > x > > # make a simple function to get a single column from a matrix > getColumn <- function(x, colNum, len = nrow(x)) { > y <- x[,colNum] > length(y) <- len > y > } > > # what is the maximum number of rows > n <- max(sapply(x, nrow)) > > # use the function to get the column from each matrix > col1 <- lapply(x, getColumn, 1, len = n) > col1 > > do.call(cbind, col1) > [,1] [,2] [,3] [,4] [,5] > [1,] 3 8 5 7 9 > [2,] 4 9 6 8 10 > [3,] 5 10 7 9 11 > [4,] NA 11 8 10 12 > [5,] NA 12 9 11 13 > [6,] NA 13 NA 12 14 > [7,] NA 14 NA 13 15 > [8,] NA 15 NA NA 16 > [9,] NA NA NA NA 17 > > Ben > >> Thanks >> Karim >> >> >> ?__ >> c/ /'_;~~~~kmezhoud >> (*) \(*) ????? ?????? >> http://bioinformatics.tn/ >> >> >> >> On Mon, Jan 19, 2015 at 4:36 PM, Ben Tupper <ben.bighair at gmail.com> wrote: >> Hi, >> >> On Jan 18, 2015, at 4:36 PM, Karim Mezhoud <kmezhoud at gmail.com> wrote: >> >> > Dear All, >> > I am trying to get correlation between Diseases (80) in columns and >> > samples in rows (UNEQUAL) using gene expression (at less 1000,numeric). For >> > this I can use CORREP package with cor.unbalanced function. >> > >> > But before to get this final matrix I need to load and to store the >> > expression of 1000 genes for every Disease (80). Every disease has >> > different number of samples (between 50 - 500). >> > >> > It is possible to get a cube of matrices with equal columns but unequal >> > rows? I think NO and I can't use array function. >> > >> > I am trying to get ? list of matrices having the same number of columns but >> > different number of rows. as >> > >> > Cubist <- vector("list", 1) >> > Cubist$Expression <- vector("list", 1) >> > >> > >> > for (i in 1:80){ >> > >> > matrix <- function(getGeneExpression[i]) >> > Cubist$Expression[[Disease[i]]] <- matrix >> > >> > } >> > >> > At this step I have: >> > length(Cubist$Expression) >> > #80 >> > dim(Cubist$Expression$Disease1) >> > #526 1000 >> > dim(Cubist$Expression$Disease2) >> > #106 1000 >> > >> > names(Cubist$Expression$Disease1[4]) >> > #ABD >> > >> > names(Cubist$Expression$Disease2[4]) >> > #ABD >> > >> > Now I need to built the final matrices for every genes (1000) that I will >> > use for CORREP function. >> > >> > Is there a way to extract directly the first column (first gene) for all >> > Diseases (80) from Cubist$Expression? or >> > >> >> I don't understand most your question, but the above seems to be straight forward. Here's a toy example: >> >> # make a list of matrices, each with nCol columns and differing >> # number of rows, nRow >> nCol <- 3 >> nRow <- sample(3:10, 5) >> x <- lapply(nRow, function(x, nc) {matrix(x:(x + nc*x - 1), ncol = nc, nrow = x)}, nCol) >> x >> >> # make a simple function to get a single column from a matrix >> getColumn <- function(x, colNum) { >> return(x[,colNum]) >> } >> >> # use the function to get the column from each matrix >> col1 <- lapply(x, getColumn, 1) >> col1 >> >> Does that help answer this part of your question? If not, you may need to create a very small example of your data and post it here using the head() and dput() functions. >> >> Ben >> >> >> >> > I need to built 1000 matrices with 80 columns and unequal rows? >> > >> > Cublist$Diseases <- vector("list", 1) >> > >> > for (k in 1:1000){ >> > for (i in 1:80){ >> > >> > Cublist$Diseases[[gene[k] ]] <- Cubist$Expression[[Diseases[i] ]][k] >> > } >> > >> > } >> > >> > This double loops is time consuming...Is there a way to do this faster? >> > >> > Thanks, >> > karim >> > ?__ >> > c/ /'_;~~~~kmezhoud >> > (*) \(*) ????? ?????? >> > http://bioinformatics.tn/ >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> > > Ben Tupper > Bigelow Laboratory for Ocean Sciences > 60 Bigelow Drive, P.O. Box 380 > East Boothbay, Maine 04544 > http://www.bigelow.org > > > > > > > > >Ben Tupper Bigelow Laboratory for Ocean Sciences 60 Bigelow Drive, P.O. Box 380 East Boothbay, Maine 04544 http://www.bigelow.org [[alternative HTML version deleted]]
I use plyr and am learning dplyr and magrittr, but those are just syntactic sugar. What I have been having difficulty with in this thread is the idea that it somehow makes sense to pad vectors with NA values... because I really don't think it does. It seems more like a hammer looking for a nail because that is what it knows how to deal with. You have a list of matrices with data in them, and switching from for loops to lapply is not in itself going to fix a memory or speed problem... normally the big improvements are in the way you allocate and use your data. Burns talks about pre-allocating the result to speed things up, but I don't understand the problem well enough to suggest an efficient data structure to pre-allocate. I suggest that Karim read and adhere to the Posting Guide (particularly the bits about giving a reproducible example and posting in plain text so it doesn't get scrambled) if help with optimizing is desired. The discussion at [1] might clarify what "reproducible" means. I will also mention that efficient algorithms for this subject area are frequently available in the Bioconductor project, so I hope you are not re-inventing the wheel and have already reviewed their tools. [1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. On January 19, 2015 6:11:38 PM PST, Ben Tupper <btupper at bigelow.org> wrote:>Hi, > >On Jan 19, 2015, at 5:17 PM, Karim Mezhoud <kmezhoud at gmail.com> wrote: > >> Thanks Ben. >> I need to learn more about apply. Have you a link or tutorial about >apply. R documentation is very short. >> >> How can obtain: >> z <- list (Col1, Col2, Col3, Col4......)? >> > >This may not be the most efficient way and there certainly is no error >checking, but you can wrap one lapply within another as shown below. >The innermost iterates over your list of input matrices, extracting one >column specified per list element. The outer lapply iterates over the >various column numbers you want to extract. > > >getMatrices <- function(colNums, dataList = x){ > # the number of rows required > n <- max(sapply(dataList, nrow)) >lapply(colNums, function(x, dat, n) { # iterate along requested columns >do.call(cbind, lapply(dat, getColumn,x, len=n)) # iterate along input >data list > }, dataList, n) >} > >getMatrices(c(1,3), dataList = x) > >If we are lucky, one of the plyr package users might show us how to do >the same with a one-liner. > > >There are endless resources online, here are some gems. > >http://www.r-project.org/doc/bib/R-books.html >http://www.rseek.org/ >http://www.burns-stat.com/documents/ >http://www.r-bloggers.com/ > >Also, I found "Data Manipulation with R" ( >http://www.r-project.org/doc/bib/R-books_bib.html#R:Spector:2008 ) >helpful. > >Ben > >> Thanks >> >> ?__ >> c/ /'_;~~~~kmezhoud >> (*) \(*) ????? ?????? >> http://bioinformatics.tn/ >> >> >> >> On Mon, Jan 19, 2015 at 8:22 PM, Ben Tupper <btupper at bigelow.org> >wrote: >> Hi again, >> >> On Jan 19, 2015, at 1:53 PM, Karim Mezhoud <kmezhoud at gmail.com> >wrote: >> >>> Yes Many thanks. >>> That is my request using lapply. >>> >>> do.call(cbind,col1) >>> >>> converts col1 to matrix but does not fill empty value with NA. >>> >>> Even for >>> >>> matrix(unlist(col1), ncol=5,byrow = FALSE) >>> >>> >>> How can get Matrix class of col1? And fill empty values with NA? >>> >> >> Perhaps best is to determine the maximum number of rows required >first, then force each subset to have that length. >> >> # make a list of matrices, each with nCol columns and differing >> # number of rows >> nCol <- 3 >> nRow <- sample(3:10, 5) >> x <- lapply(nRow, function(x, nc) {matrix(x:(x + nc*x - 1), ncol >nc, nrow = x)}, nCol) >> x >> >> # make a simple function to get a single column from a matrix >> getColumn <- function(x, colNum, len = nrow(x)) { >> y <- x[,colNum] >> length(y) <- len >> y >> } >> >> # what is the maximum number of rows >> n <- max(sapply(x, nrow)) >> >> # use the function to get the column from each matrix >> col1 <- lapply(x, getColumn, 1, len = n) >> col1 >> >> do.call(cbind, col1) >> [,1] [,2] [,3] [,4] [,5] >> [1,] 3 8 5 7 9 >> [2,] 4 9 6 8 10 >> [3,] 5 10 7 9 11 >> [4,] NA 11 8 10 12 >> [5,] NA 12 9 11 13 >> [6,] NA 13 NA 12 14 >> [7,] NA 14 NA 13 15 >> [8,] NA 15 NA NA 16 >> [9,] NA NA NA NA 17 >> >> Ben >> >>> Thanks >>> Karim >>> >>> >>> ?__ >>> c/ /'_;~~~~kmezhoud >>> (*) \(*) ????? ?????? >>> http://bioinformatics.tn/ >>> >>> >>> >>> On Mon, Jan 19, 2015 at 4:36 PM, Ben Tupper <ben.bighair at gmail.com> >wrote: >>> Hi, >>> >>> On Jan 18, 2015, at 4:36 PM, Karim Mezhoud <kmezhoud at gmail.com> >wrote: >>> >>> > Dear All, >>> > I am trying to get correlation between Diseases (80) in columns >and >>> > samples in rows (UNEQUAL) using gene expression (at less >1000,numeric). For >>> > this I can use CORREP package with cor.unbalanced function. >>> > >>> > But before to get this final matrix I need to load and to store >the >>> > expression of 1000 genes for every Disease (80). Every disease has >>> > different number of samples (between 50 - 500). >>> > >>> > It is possible to get a cube of matrices with equal columns but >unequal >>> > rows? I think NO and I can't use array function. >>> > >>> > I am trying to get ? list of matrices having the same number of >columns but >>> > different number of rows. as >>> > >>> > Cubist <- vector("list", 1) >>> > Cubist$Expression <- vector("list", 1) >>> > >>> > >>> > for (i in 1:80){ >>> > >>> > matrix <- function(getGeneExpression[i]) >>> > Cubist$Expression[[Disease[i]]] <- matrix >>> > >>> > } >>> > >>> > At this step I have: >>> > length(Cubist$Expression) >>> > #80 >>> > dim(Cubist$Expression$Disease1) >>> > #526 1000 >>> > dim(Cubist$Expression$Disease2) >>> > #106 1000 >>> > >>> > names(Cubist$Expression$Disease1[4]) >>> > #ABD >>> > >>> > names(Cubist$Expression$Disease2[4]) >>> > #ABD >>> > >>> > Now I need to built the final matrices for every genes (1000) that >I will >>> > use for CORREP function. >>> > >>> > Is there a way to extract directly the first column (first gene) >for all >>> > Diseases (80) from Cubist$Expression? or >>> > >>> >>> I don't understand most your question, but the above seems to be >straight forward. Here's a toy example: >>> >>> # make a list of matrices, each with nCol columns and differing >>> # number of rows, nRow >>> nCol <- 3 >>> nRow <- sample(3:10, 5) >>> x <- lapply(nRow, function(x, nc) {matrix(x:(x + nc*x - 1), ncol >nc, nrow = x)}, nCol) >>> x >>> >>> # make a simple function to get a single column from a matrix >>> getColumn <- function(x, colNum) { >>> return(x[,colNum]) >>> } >>> >>> # use the function to get the column from each matrix >>> col1 <- lapply(x, getColumn, 1) >>> col1 >>> >>> Does that help answer this part of your question? If not, you may >need to create a very small example of your data and post it here using >the head() and dput() functions. >>> >>> Ben >>> >>> >>> >>> > I need to built 1000 matrices with 80 columns and unequal rows? >>> > >>> > Cublist$Diseases <- vector("list", 1) >>> > >>> > for (k in 1:1000){ >>> > for (i in 1:80){ >>> > >>> > Cublist$Diseases[[gene[k] ]] <- Cubist$Expression[[Diseases[i] >]][k] >>> > } >>> > >>> > } >>> > >>> > This double loops is time consuming...Is there a way to do this >faster? >>> > >>> > Thanks, >>> > karim >>> > ?__ >>> > c/ /'_;~~~~kmezhoud >>> > (*) \(*) ????? ?????? >>> > http://bioinformatics.tn/ >>> > >>> > [[alternative HTML version deleted]] >>> > >>> > ______________________________________________ >>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> > https://stat.ethz.ch/mailman/listinfo/r-help >>> > PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >>> > and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> Ben Tupper >> Bigelow Laboratory for Ocean Sciences >> 60 Bigelow Drive, P.O. Box 380 >> East Boothbay, Maine 04544 >> http://www.bigelow.org >> >> >> >> >> >> >> >> >> > >Ben Tupper >Bigelow Laboratory for Ocean Sciences >60 Bigelow Drive, P.O. Box 380 >East Boothbay, Maine 04544 >http://www.bigelow.org > > > > > > > > > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
Thanks Ben, Jeff and Roy, Here is an example of my data Disease <- NULL Diseases <- NULL ListMatByGene <- NULL for(i in 1:3){ Disease[[i]] <-matrix(sample(-30:30,25+(5*i)),5+i) rownames(Disease[[i]]) <- paste0("Sample",1:(5+i)) colnames(Disease[[i]]) <- paste0("Gene",1:5) D <- paste0("Disease",i) Diseases[[D]] <- Disease[[i]] } getColumn <- function(x, colNum, len = nrow(x)){ y <- x[,colNum] length(y) <- len y } getMatrices <- function(colNums, dataList = x){ # the number of rows required n <- max(sapply(dataList, nrow)) lapply(colNums, function(x, dat, n) { # iterate along requested columns do.call(cbind, lapply(dat, getColumn,x, len=n)) # iterate along input data list }, dataList, n) } G <- paste0("Gene",1:5) ListMatByGene[G] <- getMatrices(c(1:ncol(Diseases[[1]])),dataList=Diseases) ## get Disease correlation by gene DiseaseCorrelation <- lapply(ListMatByGene,function(x) cor(x,use="na", method="spearman")) ##convert the list of Matrices to array ArrayDiseaseCor <- array(unlist(DiseaseCorrelation), dim c(nrow(DiseaseCorrelation[[1]]), ncol(DiseaseCorrelation[[1]]), length(DiseaseCorrelation))) dimnames(ArrayDiseaseCor) <- list(names(Diseases), names(Diseases), colnames(Diseases[[1]])) FilterDiseaseCor <- apply(ArrayDiseaseCor,MARGIN=c(1,2) ,function(x) x[abs(x)>0.5]) FilterDiseaseCor Disease1 Disease2 Disease3 Disease1 Numeric,5 Numeric,2 -0.9428571 Disease2 Numeric,2 Numeric,5 Numeric,2 Disease3 -0.9428571 Numeric,2 Numeric,5 Question is: How can get a table as: D1 D2 Cor Gene Disease1 Disease2 -0.94 Gene2 Disease1 Disease2 0.78 Gene4 Disease3 Disease2 0.5 Gene5 ... and Disease1 Disease2 Disease3 Disease1 5 1 0 Disease2 1 5 3 Disease3 0 3 5 Thanks Karim On Tue, Jan 20, 2015 at 2:11 AM, Ben Tupper <btupper at bigelow.org> wrote:> Hi, > > On Jan 19, 2015, at 5:17 PM, Karim Mezhoud <kmezhoud at gmail.com> wrote: > > Thanks Ben. > I need to learn more about apply. Have you a link or tutorial about apply. > R documentation is very short. > > How can obtain: > z <- list (Col1, Col2, Col3, Col4......)? > > > This may not be the most efficient way and there certainly is no error > checking, but you can wrap one lapply within another as shown below. The > innermost iterates over your list of input matrices, extracting one column > specified per list element. The outer lapply iterates over the various > column numbers you want to extract. > > > getMatrices <- function(colNums, dataList = x){ > # the number of rows required > n <- max(sapply(dataList, nrow)) > lapply(colNums, function(x, dat, n) { # iterate along requested columns > do.call(cbind, lapply(dat, getColumn,x, len=n)) # iterate along > input data list > }, dataList, n) > } > > getMatrices(c(1,3), dataList = x) > > If we are lucky, one of the plyr package users might show us how to do the > same with a one-liner. > > > There are endless resources online, here are some gems. > > http://www.r-project.org/doc/bib/R-books.html > http://www.rseek.org/ > http://www.burns-stat.com/documents/ > http://www.r-bloggers.com/ > > Also, I found "Data Manipulation with R" ( > http://www.r-project.org/doc/bib/R-books_bib.html#R:Spector:2008 ) > helpful. > > Ben > > Thanks > > ?__ > c/ /'_;~~~~kmezhoud > (*) \(*) ????? ?????? > http://bioinformatics.tn/ > > > > On Mon, Jan 19, 2015 at 8:22 PM, Ben Tupper <btupper at bigelow.org> wrote: > >> Hi again, >> >> On Jan 19, 2015, at 1:53 PM, Karim Mezhoud <kmezhoud at gmail.com> wrote: >> >> Yes Many thanks. >> That is my request using lapply. >> >> do.call(cbind,col1) >> >> converts col1 to matrix but does not fill empty value with NA. >> >> Even for >> >> matrix(unlist(col1), ncol=5,byrow = FALSE) >> >> >> How can get Matrix class of col1? And fill empty values with NA? >> >> >> Perhaps best is to determine the maximum number of rows required first, >> then force each subset to have that length. >> >> # make a list of matrices, each with nCol columns and differing >> # number of rows >> nCol <- 3 >> nRow <- sample(3:10, 5) >> x <- lapply(nRow, function(x, nc) {matrix(x:(x + nc*x - 1), ncol = nc, >> nrow = x)}, nCol) >> x >> >> # make a simple function to get a single column from a matrix >> getColumn <- function(x, colNum, len = nrow(x)) { >> y <- x[,colNum] >> length(y) <- len >> y >> } >> >> # what is the maximum number of rows >> n <- max(sapply(x, nrow)) >> >> # use the function to get the column from each matrix >> col1 <- lapply(x, getColumn, 1, len = n) >> col1 >> >> do.call(cbind, col1) >> [,1] [,2] [,3] [,4] [,5] >> [1,] 3 8 5 7 9 >> [2,] 4 9 6 8 10 >> [3,] 5 10 7 9 11 >> [4,] NA 11 8 10 12 >> [5,] NA 12 9 11 13 >> [6,] NA 13 NA 12 14 >> [7,] NA 14 NA 13 15 >> [8,] NA 15 NA NA 16 >> [9,] NA NA NA NA 17 >> >> Ben >> >> Thanks >> Karim >> >> >> ?__ >> c/ /'_;~~~~kmezhoud >> (*) \(*) ????? ?????? >> http://bioinformatics.tn/ >> >> >> >> On Mon, Jan 19, 2015 at 4:36 PM, Ben Tupper <ben.bighair at gmail.com> >> wrote: >> >>> Hi, >>> >>> On Jan 18, 2015, at 4:36 PM, Karim Mezhoud <kmezhoud at gmail.com> wrote: >>> >>> > Dear All, >>> > I am trying to get correlation between Diseases (80) in columns and >>> > samples in rows (UNEQUAL) using gene expression (at less >>> 1000,numeric). For >>> > this I can use CORREP package with cor.unbalanced function. >>> > >>> > But before to get this final matrix I need to load and to store the >>> > expression of 1000 genes for every Disease (80). Every disease has >>> > different number of samples (between 50 - 500). >>> > >>> > It is possible to get a cube of matrices with equal columns but unequal >>> > rows? I think NO and I can't use array function. >>> > >>> > I am trying to get ? list of matrices having the same number of >>> columns but >>> > different number of rows. as >>> > >>> > Cubist <- vector("list", 1) >>> > Cubist$Expression <- vector("list", 1) >>> > >>> > >>> > for (i in 1:80){ >>> > >>> > matrix <- function(getGeneExpression[i]) >>> > Cubist$Expression[[Disease[i]]] <- matrix >>> > >>> > } >>> > >>> > At this step I have: >>> > length(Cubist$Expression) >>> > #80 >>> > dim(Cubist$Expression$Disease1) >>> > #526 1000 >>> > dim(Cubist$Expression$Disease2) >>> > #106 1000 >>> > >>> > names(Cubist$Expression$Disease1[4]) >>> > #ABD >>> > >>> > names(Cubist$Expression$Disease2[4]) >>> > #ABD >>> > >>> > Now I need to built the final matrices for every genes (1000) that I >>> will >>> > use for CORREP function. >>> > >>> > Is there a way to extract directly the first column (first gene) for >>> all >>> > Diseases (80) from Cubist$Expression? or >>> > >>> >>> I don't understand most your question, but the above seems to be >>> straight forward. Here's a toy example: >>> >>> # make a list of matrices, each with nCol columns and differing >>> # number of rows, nRow >>> nCol <- 3 >>> nRow <- sample(3:10, 5) >>> x <- lapply(nRow, function(x, nc) {matrix(x:(x + nc*x - 1), ncol = nc, >>> nrow = x)}, nCol) >>> x >>> >>> # make a simple function to get a single column from a matrix >>> getColumn <- function(x, colNum) { >>> return(x[,colNum]) >>> } >>> >>> # use the function to get the column from each matrix >>> col1 <- lapply(x, getColumn, 1) >>> col1 >>> >>> Does that help answer this part of your question? If not, you may need >>> to create a very small example of your data and post it here using the >>> head() and dput() functions. >>> >>> Ben >>> >>> >>> >>> > I need to built 1000 matrices with 80 columns and unequal rows? >>> > >>> > Cublist$Diseases <- vector("list", 1) >>> > >>> > for (k in 1:1000){ >>> > for (i in 1:80){ >>> > >>> > Cublist$Diseases[[gene[k] ]] <- Cubist$Expression[[Diseases[i] ]][k] >>> > } >>> > >>> > } >>> > >>> > This double loops is time consuming...Is there a way to do this faster? >>> > >>> > Thanks, >>> > karim >>> > ?__ >>> > c/ /'_;~~~~kmezhoud >>> > (*) \(*) ????? ?????? >>> > http://bioinformatics.tn/ >>> > >>> > [[alternative HTML version deleted]] >>> > >>> > ______________________________________________ >>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> > https://stat.ethz.ch/mailman/listinfo/r-help >>> > PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> <http://www.r-project.org/posting-guide.html> >>> > and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> Ben Tupper >> Bigelow Laboratory for Ocean Sciences >> 60 Bigelow Drive, P.O. Box 380 >> East Boothbay, Maine 04544 >> http://www.bigelow.org >> >> >> >> >> >> >> >> >> > > Ben Tupper > Bigelow Laboratory for Ocean Sciences > 60 Bigelow Drive, P.O. Box 380 > East Boothbay, Maine 04544 > http://www.bigelow.org > > > > > > > > >[[alternative HTML version deleted]]