Peter.Watkins@foodscience.afisc.csiro.au
2005-Sep-08 02:21 UTC
[R] Effect of data set size on calculation
Dear listers,
I have a piece of code which performs an ANOVA type of analysis on 2D GC
data. The code is shown below:
# ANOVA 2D GC analysis
# maxc <- number of samples
# nreps <- number of samples
maxc <- 2
nreps <- 4
sscl <- NULL
cmean <- NULL
#
# Initial stat. variable
#
dftot <- nrow(mat)-1
dfcl <- maxc - 1
dferr <- dftot - dfcl
totmean <- mean(mat)
sstot <- sd(mat)^2*dftot
#
# Calculate class-to-class variance
#
for (j in 1:maxc) {
cmean <- rbind(cmean,mean(mat[((j-1)*nreps+1):((j-1)*nreps+nreps),]))
}
for (j in 1:ncol(mat)) {
cmean[,j] <- cmean[,j]-totmean[j]
}
cmean <- (cmean)^2*nreps
for (i in 1:ncol(mat)) {
sscl[i] <- sum(cmean[,i])
}
#
# sserr <- sstot-sscl
#
ratios <- (sscl/dfcl)/((sstot-sscl)/dferr)
I have tested the above on a small data set (based on average on the
second dimension) and produced a result which was meaningful. However,
when I analyse data with both dimensions (larger dataset), the analysis
is not successful. I've narrowed the problem down to the calculation for
cmean but I have no idea why there is a problem. If anyone has any
suggestions then feel free to comment. Relevant output is given below.
Many thanks, Peter.
# Averaged dataset
> ncol(mat)
[1] 636
> nrow(mat)
[1] 8
[SNIP]
> for (j in 1:maxc) {
+ cmean <- rbind(cmean,mean(mat[((j-1)*nreps+1):((j-1)*nreps+nreps),]))
+ }
> cmean
V2 V3 V4 V5 V6 V7 V8
V9
[1,] 27.38970 27.68816 27.80730 27.72688 27.68044 27.33749 6667.038
15537.47
[2,] 26.36001 26.72920 26.64940 26.82506 26.54539 26.30811 8029.746
13656.60
... [SNIP]
V634 V635 V636 V637
[1,] 27.51868 27.51270 27.52344 27.52127
[2,] 26.45830 26.45837 26.46089 26.46407
>
# Full dataset
> nrow(mat)
[1] 8
> ncol(mat)
[1] 390010
[SNIP]
> for (j in 1:maxc) {
+ cmean <- rbind(cmean,mean(mat[((j-1)*nreps+1):((j-1)*nreps+nreps),]))
+ }
> cmean
[,1]
[1,] 54.48274
[2,] 63.14705
>
[[alternative HTML version deleted]]
