thr3ads.net - R help - [R] Effect of data set size on calculation [Sep 2005]

If this information is useful, please help other people find it:
Share via:

Peter.Watkins@foodscience.afisc.csiro.au

2005-Sep-08 02:21 UTC

[R] Effect of data set size on calculation

Dear listers,

 

I have a piece of code which performs an ANOVA type of analysis on 2D GC
data. The code is shown below:

 

# ANOVA 2D GC analysis

# maxc <- number of samples

# nreps <- number of samples

maxc <- 2

nreps <- 4

sscl <- NULL

cmean <- NULL

# 

#     Initial stat. variable

#

dftot <- nrow(mat)-1

dfcl <- maxc - 1

dferr <- dftot - dfcl

totmean <- mean(mat)

sstot <- sd(mat)^2*dftot

#

#     Calculate class-to-class variance

#

for (j in 1:maxc) {

cmean <- rbind(cmean,mean(mat[((j-1)*nreps+1):((j-1)*nreps+nreps),]))

}

for (j in 1:ncol(mat)) {

cmean[,j] <- cmean[,j]-totmean[j]

}

cmean <- (cmean)^2*nreps

for (i in 1:ncol(mat)) {

sscl[i] <- sum(cmean[,i])

}

#

#     sserr <- sstot-sscl

#

ratios <- (sscl/dfcl)/((sstot-sscl)/dferr)

 

I have tested the above on a small data set (based on average on the
second dimension) and produced a result which was meaningful. However,
when I analyse data with both dimensions (larger dataset), the analysis
is not successful. I've narrowed the problem down to the calculation for
cmean but I have no idea why there is a problem. If anyone has any
suggestions then feel free to comment. Relevant output is given below.

 

Many thanks, Peter.

 

# Averaged dataset 

 
> ncol(mat)
[1] 636
> nrow(mat)
[1] 8

 

[SNIP]

 
> for (j in 1:maxc) {
+ cmean <- rbind(cmean,mean(mat[((j-1)*nreps+1):((j-1)*nreps+nreps),]))

+ }
> cmean
           V2       V3       V4       V5       V6       V7       V8
V9

[1,] 27.38970 27.68816 27.80730 27.72688 27.68044 27.33749 6667.038
15537.47

[2,] 26.36001 26.72920 26.64940 26.82506 26.54539 26.30811 8029.746
13656.60

 

... [SNIP]

 

         V634     V635     V636     V637

[1,] 27.51868 27.51270 27.52344 27.52127

[2,] 26.45830 26.45837 26.46089 26.46407
>
 

# Full dataset

 
> nrow(mat)
[1] 8
> ncol(mat)
[1] 390010

 

[SNIP]

 
> for (j in 1:maxc) {
+ cmean <- rbind(cmean,mean(mat[((j-1)*nreps+1):((j-1)*nreps+nreps),]))

+ }
> cmean
         [,1]

[1,] 54.48274

[2,] 63.14705
>
 

 

 


	[[alternative HTML version deleted]]

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Sep 2005 - Effect of data set size on calculation

[R] Effect of data set size on calculation

Seemingly Similar Threads

Wisdom of the Ancients