Hi,
I am formatting the codes using library(formatR).? Hopefully, it will not be
mangled in the email.
dir.create("final")
lst1 <- split(list.files(pattern = ".csv"), gsub("\\_.*",
"", list.files(pattern = ".csv")))
lst2 <- lapply(lst1, function(x1) lapply(x1, function(x2) { lines1 <-
readLines(x2) header1 <- lines1[1:2] dat1 <- read.table(text = lines1,
header = FALSE, sep = ",", stringsAsFactors = FALSE, skip = 2)
colnames(dat1) <- Reduce(paste, strsplit(header1, ","))
dat1[-c(nrow(dat1), nrow(dat1) - 1), ]
}))
library(plyr)
lapply(seq_along(lst2), function(i) { lstN <- lapply(lst2[[i]], function(x)
x[, -1]) lstQ1 <- lapply(lstN, function(x) numcolwise(function(y) quantile(y,
seq(0, 1, by = 0.01), na.rm = TRUE))(x)) arr1 <- array(unlist(lstQ1), dim =
c(dim(lstQ1[[1]]), length(lstQ1)), dimnames = list(NULL, lapply(lstQ1,
names)[[1]])) res <- rowMeans(arr1, dims = 2, na.rm = TRUE) colnames(res)
<- gsub(" ", "_", colnames(res)) res1 <-
data.frame(Percentiles = paste0(seq(0, 100, by = 1), "%"), res,
stringsAsFactors = FALSE) write.csv(res1, paste0(paste(getwd(),
"final", paste(names(lst1)[[i]], "Quantile", sep =
"_"), sep = "/"), ".csv"), row.names = FALSE,
quote = FALSE)
})
ReadOut1 <- lapply(list.files(recursive = TRUE)[grep("Quantile",
list.files(recursive = TRUE))], function(x) read.csv(x, header = TRUE,
stringsAsFactors = FALSE))
sapply(ReadOut1, dim)
# [,1] [,2]
#[1,] 101 101
#[2,] 258 258
lapply(ReadOut1,function(x) x[1:2,1:3])
#[[1]]
# Percentiles txav_DJF txav_MAM
#1 0% -12.68566 7.09702
#2 1% -12.59062 7.15338
#
#[[2]]
# Percentiles txav_DJF txav_MAM
#1 0% -12.75516 6.841840
#2 1% -12.68244 6.910664?
###Q2:
dir.create("Indices")
names1 <- lapply(ReadOut1, function(x) names(x))[[1]]
lstNew <- simplify2array(ReadOut1) lapply(2:nrow(lstNew), function(i) { dat1
<- data.frame(lstNew[1], do.call(cbind, lstNew[i, ]), stringsAsFactors =
FALSE) colnames(dat1) <- c(rownames(lstNew)[1], paste(names(lst1),
rep(rownames(lstNew)[i], length(lst1)), sep = "_")) write.csv(dat1,
paste0(paste(getwd(), "Indices", rownames(lstNew)[i], sep =
"/"), ".csv"), row.names = FALSE, quote = FALSE)
}) ## Output2:
ReadOut2 <- lapply(list.files(recursive = TRUE)[grep("Indices",
list.files(recursive = TRUE))], function(x) read.csv(x, header = TRUE,
stringsAsFactors = FALSE))
length(ReadOut2)
# [1] 257
head(ReadOut2[[1]], 2)
# Percentiles G100_pav_ANN G101_pav_ANN
#1 0% 1.054380 1.032740
#2 1% 1.069457 1.045689
A.K.
On Sunday, April 13, 2014 2:46 AM, Zilefac Elvis <zilefacelvis at
yahoo.com> wrote:
Hi AK,
Q1)?I need your help again. Using the previous data (attached) and the previous
code below,instead of taking rowMeans, let's
do?quantile(x,seq(0,1,by=0.01)).?
Delete the last 2 rows (Trend and p<)?in each file before
doing?quantile(x,seq(0,1,by=0.01)).
For example, assume that I want to calculate?quantile(x,seq(0,1,by=0.01)) for
each column of Site G100. I will do so for the 5 sims of site G100 and then take
their average. This will be approximately close to the true value than just
calculating quantile(x,seq(0,1,by=0.01)) from one sim. Please do this same thing
for all the files.
So, when you do rowMeans, it should be the mean of quantile(x,seq(0,1,by=0.01))
calculated from all sims in that Site.
Output
The number of files in "final" remains the same (2 files). The
"Year" column(will be replaced)?will contain ?the names of
quantile(x,seq(0,1,by=0.01)) such as??0% ? ? ? ? ? 1% ? ? ? ? ? 2% ? ? ? ? ? 3%
? ? ? ? ? 4% ? ? ? ? ? 5% ? ? ? ? ? 6%, ..., 98% ? ? ? ? ?99% ? ? ? ? 100% . You
can give this column any name such as "Percentiles".
Q2) ?From the folder "final", please go to each file identified by
site name, take a column, say?col1 of txav??from each file, create a dataframe
whose colnames are site codes (names of files in "final"). Create a
folder called "Indices" and place this dataframe in it. The filename
for the dataframe is?txav, say. So, in "Indices", you will have one
file having 3 columns [, c(Percentiles, G100,G101)]. The idea is that I want to
be able to pick any column from files in "final" and form a dataframe
from which I will generate my qqplot or boxplot.
Thanks very much AK.
Atem
This should be the final step of this my drama, at least for now.
#=============================================================================================================
dir.create("final")
lst1 <-
split(list.files(pattern=".csv"),gsub("\\_.*","",list.files(pattern=".csv")))
lst2 <-? lapply(lst1,function(x1) lapply(x1, function(x2) {lines1 <-
readLines(x2); header1 <- lines1[1:2]; dat1 <-
read.table(text=lines1,header=FALSE,sep=",",stringsAsFactors=FALSE,
skip=2); colnames(dat1) <-
Reduce(paste,strsplit(header1,","));dat1}))
lstYear <- lapply(lst2,function(x) lapply(x, function(y)
y[,1,drop=FALSE])[[1]])?
lapply(seq_along(lst2),function(i) {lstN <-lapply(lst2[[i]],function(x)
x[,-1]); arr1 <-
array(unlist(lstN),dim=c(dim(lstN[[1]]),length(lstN)),dimnames=list(NULL,lapply(lstN,names)[[1]]));res
<- cbind(lstYear[[i]],rowMeans(arr1,dims=2,na.rm=TRUE)); names(res) <-
gsub("\\_$","",gsub(" ",
"_",names(res))); res[,1] <- gsub(" <",
"",res[,1]);
write.csv(res,paste0(paste(getwd(),"final",names(lst1)
[[i]],sep="/"),".csv"),row.names=FALSE,quote=FALSE)? })?
#====================================================================================================
Zilefac Elvis
2014-Apr-13 20:17 UTC
[R] Quantile and rowMean from multiple files in a folder
Hi AK,
I must admit that you did an excellent job.
Thanks very much.
My analysis is manageable now.
Regards,
Atem.
On Sunday, April 13, 2014 8:54 AM, arun <smartpink111@yahoo.com> wrote:
Hi,
I am formatting the codes using library(formatR). Hopefully, it will not be
mangled in the email.
dir.create("final")
lst1 <- split(list.files(pattern = ".csv"), gsub("\\_.*",
"", list.files(pattern = ".csv")))
lst2 <- lapply(lst1, function(x1) lapply(x1, function(x2) { lines1 <-
readLines(x2) header1 <- lines1[1:2] dat1 <- read.table(text = lines1,
header = FALSE, sep = ",", stringsAsFactors = FALSE, skip = 2)
colnames(dat1) <- Reduce(paste, strsplit(header1, ","))
dat1[-c(nrow(dat1), nrow(dat1) - 1), ]
}))
library(plyr)
lapply(seq_along(lst2), function(i) { lstN <- lapply(lst2[[i]], function(x)
x[, -1]) lstQ1 <- lapply(lstN, function(x) numcolwise(function(y) quantile(y,
seq(0, 1, by = 0.01), na.rm = TRUE))(x)) arr1 <- array(unlist(lstQ1), dim =
c(dim(lstQ1[[1]]), length(lstQ1)), dimnames = list(NULL, lapply(lstQ1,
names)[[1]])) res <- rowMeans(arr1, dims = 2, na.rm = TRUE) colnames(res)
<- gsub(" ", "_", colnames(res)) res1 <-
data.frame(Percentiles = paste0(seq(0, 100, by = 1), "%"), res,
stringsAsFactors = FALSE) write.csv(res1, paste0(paste(getwd(),
"final", paste(names(lst1)[[i]], "Quantile", sep =
"_"), sep = "/"), ".csv"), row.names = FALSE,
quote = FALSE)
})
ReadOut1 <- lapply(list.files(recursive = TRUE)[grep("Quantile",
list.files(recursive = TRUE))], function(x) read.csv(x, header = TRUE,
stringsAsFactors = FALSE))
sapply(ReadOut1, dim)
# [,1] [,2]
#[1,] 101 101
#[2,] 258 258
lapply(ReadOut1,function(x) x[1:2,1:3])
#[[1]]
# Percentiles txav_DJF txav_MAM
#1 0% -12.68566 7.09702
#2 1% -12.59062 7.15338
#
#[[2]]
# Percentiles txav_DJF txav_MAM
#1 0% -12.75516 6.841840
#2 1% -12.68244 6.910664
###Q2:
dir.create("Indices")
names1 <- lapply(ReadOut1, function(x) names(x))[[1]]
lstNew <- simplify2array(ReadOut1) lapply(2:nrow(lstNew), function(i) { dat1
<- data.frame(lstNew[1], do.call(cbind, lstNew[i, ]), stringsAsFactors =
FALSE) colnames(dat1) <- c(rownames(lstNew)[1], paste(names(lst1),
rep(rownames(lstNew)[i], length(lst1)), sep = "_")) write.csv(dat1,
paste0(paste(getwd(), "Indices", rownames(lstNew)[i], sep =
"/"), ".csv"), row.names = FALSE, quote = FALSE)
}) ## Output2:
ReadOut2 <- lapply(list.files(recursive = TRUE)[grep("Indices",
list.files(recursive = TRUE))], function(x) read.csv(x, header = TRUE,
stringsAsFactors = FALSE))
length(ReadOut2)
# [1] 257
head(ReadOut2[[1]], 2)
# Percentiles G100_pav_ANN G101_pav_ANN
#1 0% 1.054380 1.032740
#2 1% 1.069457 1.045689
A.K.
On Sunday, April 13, 2014 2:46 AM, Zilefac Elvis <zilefacelvis@yahoo.com>
wrote:
Hi AK,
Q1) I need your help again. Using the previous data (attached) and the previous
code below,instead of taking rowMeans, let's
do quantile(x,seq(0,1,by=0.01)).
Delete the last 2 rows (Trend and p<) in each file before
doing quantile(x,seq(0,1,by=0.01)).
For example, assume that I want to calculate quantile(x,seq(0,1,by=0.01)) for
each column of Site G100. I will do so for the 5 sims of site G100 and then take
their average. This will be approximately close to the true value than just
calculating quantile(x,seq(0,1,by=0.01)) from one sim. Please do this same thing
for all the files.
So, when you do rowMeans, it should be the mean of quantile(x,seq(0,1,by=0.01))
calculated from all sims in that Site.
Output
The number of files in "final" remains the same (2 files). The
"Year" column(will be replaced) will contain the names of
quantile(x,seq(0,1,by=0.01)) such as 0% 1% 2% 3%
4% 5% 6%, ..., 98% 99% 100% . You
can give this column any name such as "Percentiles".
Q2) From the folder "final", please go to each file identified by
site name, take a column, say col1 of txav from each file, create a dataframe
whose colnames are site codes (names of files in "final"). Create a
folder called "Indices" and place this dataframe in it. The filename
for the dataframe is txav, say. So, in "Indices", you will have one
file having 3 columns [, c(Percentiles, G100,G101)]. The idea is that I want to
be able to pick any column from files in "final" and form a dataframe
from which I will generate my qqplot or boxplot.
Thanks very much AK.
Atem
This should be the final step of this my drama, at least for now.
#=============================================================================================================
dir.create("final")
lst1 <-
split(list.files(pattern=".csv"),gsub("\\_.*","",list.files(pattern=".csv")))
lst2 <- lapply(lst1,function(x1) lapply(x1, function(x2) {lines1 <-
readLines(x2); header1 <- lines1[1:2]; dat1 <-
read.table(text=lines1,header=FALSE,sep=",",stringsAsFactors=FALSE,
skip=2); colnames(dat1) <-
Reduce(paste,strsplit(header1,","));dat1}))
lstYear <- lapply(lst2,function(x) lapply(x, function(y)
y[,1,drop=FALSE])[[1]])
lapply(seq_along(lst2),function(i) {lstN <-lapply(lst2[[i]],function(x)
x[,-1]); arr1 <-
array(unlist(lstN),dim=c(dim(lstN[[1]]),length(lstN)),dimnames=list(NULL,lapply(lstN,names)[[1]]));res
<- cbind(lstYear[[i]],rowMeans(arr1,dims=2,na.rm=TRUE)); names(res) <-
gsub("\\_$","",gsub(" ",
"_",names(res))); res[,1] <- gsub(" <",
"",res[,1]);
write.csv(res,paste0(paste(getwd(),"final",names(lst1)
[[i]],sep="/"),".csv"),row.names=FALSE,quote=FALSE) })
#===================================================================================================
[[alternative HTML version deleted]]