Hello, I am trying to do an ANOVA on a microarray data set consisting of 22690 elements. The ANOVA is fine, but when I try to put the data in a frame in order to exporting it, I get a stack overflow. I have found documentation on dynamic memory in R, but not on how to increase the stack size. The code I'm using is below. If anyone has any suggestions for a workaround here, I'd appreciate it. Thank you. Bill Noble ------------ R : Copyright 2003, The R Development Core Team Version 1.7.1 (2003-06-16) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type `license()' or `licence()' for distribution details. R is a collaborative project with many contributors. Type `contributors()' for more information. Type `demo()' for some demos, `help()' for on-line help, or `help.start()' for a HTML browser interface to help. Type `q()' to quit R.> invisible(options(echo = TRUE)) > sdata <- read.table("../../data/02-09-03/data.mtx", header=T, row.names=1) > infection <- gl(2,8,16, label=c("no infection", "infection")) > labor <- gl(2,4,16, label=c("no labor", "labor")) > aof <- function(x) {+ m <- data.frame(infection, labor, x); + anova(aov(x ~ infection + labor + infection*labor, m)) + }> anovaresults <- apply(sdata, 1, aof) > pvalues <- data.frame(lapply(anovaresults, function(x) { x["Pr(>F)"][1:3,] }))Error: protect(): stack overflow> anovaresults[[1]]Analysis of Variance Table Response: x Df Sum Sq Mean Sq F value Pr(>F) infection 1 9082 9082 0.2315 0.63907 labor 1 98722 98722 2.5164 0.13865 infection:labor 1 143262 143262 3.6517 0.08019 . Residuals 12 470776 39231 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1> temp <- lapply(anovaresults, function(x) { x["Pr(>F)"][1:3,] }) > length(temp)[1] 22690> temp[1]$"AFFX-BioB-5_at" 1 2 3 0.63906707 0.13865289 0.08018914> gc(verbose=TRUE)Garbage collection 62 = 23+6+33 (level 2) ... 1430471 cons cells free (28%) 17.4 Mbytes of heap free (32%) used (Mb) gc trigger (Mb) Ncells 3249183 86.8 4953636 132.3 Vcells 4749443 36.3 7025348 53.6> pvalues <- data.frame(temp)Error: protect(): stack overflow> gc(verbose=TRUE)Garbage collection 63 = 23+6+34 (level 2) ... 1636368 cons cells free (33%) 19.5 Mbytes of heap free (34%) used (Mb) gc trigger (Mb) Ncells 3317268 88.6 4953636 132.3 Vcells 4794917 36.6 7353029 56.1
Bill - Here's what I would do, starting after your display of anovaresults[[1]]. temp.1 <- unlist(lapply(anovaresults, function(x) { x["Pr(>F)"][1:3],] })) temp.2 <- matrix(temp.1, length(anovaresults), 3, byrow=T) dimnames(temp.2) <- list(names(anovaresults), dimnames(anovaresults[[1]])[[1]][1:3]) rm("anovaresults") pvalues <- data.frame(temp.2) Another suggestion is to do the subscripting that extracts the single column of p-values inside your function aof(). Then the list returned by apply() will be much smaller and have a simpler structure. HTH - tom blackwell - u michigan medical school - ann arbor - On Fri, 5 Sep 2003, William Noble wrote:> I am trying to do an ANOVA on a microarray data set consisting of > 22690 elements. The ANOVA is fine, but when I try to put the data in > a frame in order to exporting it, I get a stack overflow. I have > found documentation on dynamic memory in R, but not on how to increase > the stack size. The code I'm using is below. If anyone has any > suggestions for a workaround here, I'd appreciate it. > > Thank you. > Bill Noble > ------------ > > > invisible(options(echo = TRUE)) > > sdata <- read.table("../../data/02-09-03/data.mtx", header=T, row.names=1) > > infection <- gl(2,8,16, label=c("no infection", "infection")) > > labor <- gl(2,4,16, label=c("no labor", "labor")) > > aof <- function(x) { > + m <- data.frame(infection, labor, x); > + anova(aov(x ~ infection + labor + infection*labor, m)) > + } > > anovaresults <- apply(sdata, 1, aof) > > pvalues <- data.frame(lapply(anovaresults, function(x) { x["Pr(>F)"][1:3,] })) > Error: protect(): stack overflow > > > > anovaresults[[1]] > Analysis of Variance Table > > Response: x > Df Sum Sq Mean Sq F value Pr(>F) > infection 1 9082 9082 0.2315 0.63907 > labor 1 98722 98722 2.5164 0.13865 > infection:labor 1 143262 143262 3.6517 0.08019 . > Residuals 12 470776 39231 > --- > Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > > > > temp <- lapply(anovaresults, function(x) { x["Pr(>F)"][1:3,] }) > > length(temp) > [1] 22690 > > temp[1] > $"AFFX-BioB-5_at" > 1 2 3 > 0.63906707 0.13865289 0.08018914 > > gc(verbose=TRUE) > Garbage collection 62 = 23+6+33 (level 2) ... > 1430471 cons cells free (28%) > 17.4 Mbytes of heap free (32%) > used (Mb) gc trigger (Mb) > Ncells 3249183 86.8 4953636 132.3 > Vcells 4749443 36.3 7025348 53.6 > > pvalues <- data.frame(temp) > Error: protect(): stack overflow > > gc(verbose=TRUE) > Garbage collection 63 = 23+6+34 (level 2) ... > 1636368 cons cells free (33%) > 19.5 Mbytes of heap free (34%) > used (Mb) gc trigger (Mb) > Ncells 3317268 88.6 4953636 132.3 > Vcells 4794917 36.6 7353029 56.1 >
William Noble <noble at gs.washington.edu> writes:> Hello, > > I am trying to do an ANOVA on a microarray data set consisting of > 22690 elements. The ANOVA is fine, but when I try to put the data in > a frame in order to exporting it, I get a stack overflow. I have > found documentation on dynamic memory in R, but not on how to increase > the stack size. The code I'm using is below. If anyone has any > suggestions for a workaround here, I'd appreciate it.You might want to consider turning it into a matrix instead (just use sapply()). However, this looks like a bug and it has even got worse in r-devel. So thanks for drawing attention to it. To paraphrase the situation, just take a long list of short vectors and turn it into a data frame:> tmp <- lapply(1:22690,function(i)rnorm(3)) > xx <- data.frame(tmp)Segmentation fault The problem comes from within the error handler itself. Program received signal SIGSEGV, Segmentation fault. Rf_errorcall (call=0x81dadb0, format=0x817c598 "protect(): stack overflow") at ../../../R/src/main/errors.c:481 481 vsignalError(call, format, ap); The traceback indicates that Rf_protect() goes into infinite recursion. (Luke?) Deep down in the stack we have #760082 0x08073e48 in Rf_substituteList (el=0x8d47a9c, rho=0x81dadb0) at ../../../R/src/main/coerce.c:1811 1811 PROTECT(h = substitute(CAR(el), rho)); (gdb) down #760081 0x080bf717 in Rf_protect (s=0x978a050) at ../../../R/src/main/memory.c:1999 1999 errorcall(R_NilValue, "protect(): stack overflow"); and below that we have #760082 0x08073e48 in Rf_substituteList (el=0x8d47a9c, rho=0x81dadb0) at ../../../R/src/main/coerce.c:1811 1811 PROTECT(h = substitute(CAR(el), rho)); (gdb) #760083 0x08073e54 in Rf_substituteList (el=0x8d47ab8, rho=0x81dadb0) at ../../../R/src/main/coerce.c:1812 1812 PROTECT(t = substituteList(CDR(el), rho)); (gdb) #760084 0x08073e54 in Rf_substituteList (el=0x8d47ad4, rho=0x81dadb0) at ../../../R/src/main/coerce.c:1812 1812 PROTECT(t = substituteList(CDR(el), rho)); (gdb) #760085 0x08073e54 in Rf_substituteList (el=0x8d47af0, rho=0x81dadb0) at ../../../R/src/main/coerce.c:1812 1812 PROTECT(t = substituteList(CDR(el), rho)); So the original problem is that substituteList() is not happy with long lists. I'm not really sure but my gut feeling is that the tail should be computed before the head (i.e. reverse lines 1811 and 1812) so that you don't end up with a pile of heads computed before the recursion ends. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907