Peter.Watkins@foodscience.afisc.csiro.au
2005-Sep-08 02:21 UTC
[R] Effect of data set size on calculation
Dear listers, I have a piece of code which performs an ANOVA type of analysis on 2D GC data. The code is shown below: # ANOVA 2D GC analysis # maxc <- number of samples # nreps <- number of samples maxc <- 2 nreps <- 4 sscl <- NULL cmean <- NULL # # Initial stat. variable # dftot <- nrow(mat)-1 dfcl <- maxc - 1 dferr <- dftot - dfcl totmean <- mean(mat) sstot <- sd(mat)^2*dftot # # Calculate class-to-class variance # for (j in 1:maxc) { cmean <- rbind(cmean,mean(mat[((j-1)*nreps+1):((j-1)*nreps+nreps),])) } for (j in 1:ncol(mat)) { cmean[,j] <- cmean[,j]-totmean[j] } cmean <- (cmean)^2*nreps for (i in 1:ncol(mat)) { sscl[i] <- sum(cmean[,i]) } # # sserr <- sstot-sscl # ratios <- (sscl/dfcl)/((sstot-sscl)/dferr) I have tested the above on a small data set (based on average on the second dimension) and produced a result which was meaningful. However, when I analyse data with both dimensions (larger dataset), the analysis is not successful. I've narrowed the problem down to the calculation for cmean but I have no idea why there is a problem. If anyone has any suggestions then feel free to comment. Relevant output is given below. Many thanks, Peter. # Averaged dataset> ncol(mat)[1] 636> nrow(mat)[1] 8 [SNIP]> for (j in 1:maxc) {+ cmean <- rbind(cmean,mean(mat[((j-1)*nreps+1):((j-1)*nreps+nreps),])) + }> cmeanV2 V3 V4 V5 V6 V7 V8 V9 [1,] 27.38970 27.68816 27.80730 27.72688 27.68044 27.33749 6667.038 15537.47 [2,] 26.36001 26.72920 26.64940 26.82506 26.54539 26.30811 8029.746 13656.60 ... [SNIP] V634 V635 V636 V637 [1,] 27.51868 27.51270 27.52344 27.52127 [2,] 26.45830 26.45837 26.46089 26.46407># Full dataset> nrow(mat)[1] 8> ncol(mat)[1] 390010 [SNIP]> for (j in 1:maxc) {+ cmean <- rbind(cmean,mean(mat[((j-1)*nreps+1):((j-1)*nreps+nreps),])) + }> cmean[,1] [1,] 54.48274 [2,] 63.14705>[[alternative HTML version deleted]]