Dear all,
I am currently using R to normalize data (an OTU table). This all happens by
using the Vegan package. An example of the script that I'm using is posted
below.
What happens is the following:
I first read a text file in which the OTU table is stored. In this table every
column represents an OTU, whereas rows represent the samples (see example
below). The data from this table is pasted to a txt file from an excel data
sheet.
Afterwards, I want to normaize this table, i.e. rarefy it to the sample with the
least number of sequences. Since there can be a lot of variation between
normalizations, I chose to perform a certain number of iterations (100 in the
example below).
The rarefied OTU_TABLE is stored in OTU_sub. This table is identical to the
original table (same number of rows and columns, same header), but the numbers
will vary (for example, instead of 25 in the Sample1/OTU1 field, this will have
changed to 14).
The loop performs the iterations and rarefies the data 100 times. Using these
results, a boxplot is generated that gives an overview of the results of all the
iterations (in this case the recovered richness/number of OTUs left after
normalization).
This works fine, but I would like to step up my game.
What I would like to do is perform x iterations. However, Out of all the
iterations, I want to make a final OTU_sub table. This table contains, for each
field, the average of all the iterations performed.
Suppose I do 3 iterations and for Sample1/OTU1 I get the numbers 14, 5 and 20 in
the different iterations, the final table should give me (14+5+20)/3 = 13.
The same thing should be done for all the other fields. Obviously, nothing
should change to the fields belonging to the sample with the least number of
reads.
If possible, I would also like to get the standard deviation for each field,
preferably in a seperate table. This would allow me to thoroughly analyze the
normalization data.
If anyone would be able to assist me with this, I would be very grateful!
Sincerely,
Guillaume Tahon
Example OTU_TABLE:
? OTU1? OTU2? OTU3?
?Sample1 ?25 ?5 0?
?Sample2 ?1 ?18 ?2
?Sample3 ?0 ?1 ?20
?Sample4 ?185 4? ?25
Currently used script:
library(vegan)
OTU_TABLE<-read.table(file="OTU_TABLE.txt", header=T)
N=100 # No. of iterations
richness_mat <- matrix(nrow = N, ncol = 4) # 4 Samples
colnames(richness_mat)<-c("sample1","sample2",
"sample3", "sample4")
for(i in 1:N){
OTU_sub <- rrarefy(OTU_TABLE, min(rowSums(OTU_TABLE))) # rarefy to the sample
with the least number of sequences
OTU_vector <- rowSums(OTU_sub > 0) # Count the number of OTUs per sample
richness_mat[i,1] <- OTU_vector[1]
richness_mat[i,2] <- OTU_vector[2]
richness_mat[i,3] <- OTU_vector[3]
richness_mat[i,4] <- OTU_vector[4]
}
par(mar = c(3,4,2,2))
boxplot(richness_mat, ylab = "number of OTUs")
paste(round(colMeans(as.matrix(richness_mat)),2),"?",
round(apply(richness_mat, 2, sd),2))
[[alternative HTML version deleted]]