thr3ads.net - R help - [R] Quantile and rowMean from multiple files in a folder [Apr 2014]

If this information is useful, please help other people find it:
Share via:

arun

2014-Apr-13 14:54 UTC

[R] Quantile and rowMean from multiple files in a folder

Hi,

I am formatting the codes using library(formatR).? Hopefully, it will not be
mangled in the email.
dir.create("final")
lst1 <- split(list.files(pattern = ".csv"), gsub("\\_.*",
"", list.files(pattern = ".csv")))

lst2 <- lapply(lst1, function(x1) lapply(x1, function(x2) { lines1 <-
readLines(x2) header1 <- lines1[1:2] dat1 <- read.table(text = lines1,
header = FALSE, sep = ",", stringsAsFactors = FALSE,  skip = 2)
colnames(dat1) <- Reduce(paste, strsplit(header1, ","))
dat1[-c(nrow(dat1), nrow(dat1) - 1), ]
}))

library(plyr) 

lapply(seq_along(lst2), function(i) { lstN <- lapply(lst2[[i]], function(x)
x[, -1]) lstQ1 <- lapply(lstN, function(x) numcolwise(function(y) quantile(y,
seq(0, 1,  by = 0.01), na.rm = TRUE))(x)) arr1 <- array(unlist(lstQ1), dim =
c(dim(lstQ1[[1]]), length(lstQ1)), dimnames = list(NULL,  lapply(lstQ1,
names)[[1]])) res <- rowMeans(arr1, dims = 2, na.rm = TRUE) colnames(res)
<- gsub(" ", "_", colnames(res)) res1 <-
data.frame(Percentiles = paste0(seq(0, 100, by = 1), "%"), res,
stringsAsFactors = FALSE) write.csv(res1, paste0(paste(getwd(),
"final", paste(names(lst1)[[i]], "Quantile",  sep =
"_"), sep = "/"), ".csv"), row.names = FALSE,
quote = FALSE)
})

ReadOut1 <- lapply(list.files(recursive = TRUE)[grep("Quantile",
list.files(recursive = TRUE))],  function(x) read.csv(x, header = TRUE,
stringsAsFactors = FALSE))
sapply(ReadOut1, dim)
#     [,1] [,2]
#[1,]  101  101
#[2,]  258  258

lapply(ReadOut1,function(x) x[1:2,1:3])
#[[1]]
#  Percentiles  txav_DJF txav_MAM
#1          0% -12.68566  7.09702
#2          1% -12.59062  7.15338
#
#[[2]]
#  Percentiles  txav_DJF txav_MAM
#1          0% -12.75516 6.841840
#2          1% -12.68244 6.910664?


###Q2:

dir.create("Indices")
names1 <- lapply(ReadOut1, function(x) names(x))[[1]]
lstNew <- simplify2array(ReadOut1) lapply(2:nrow(lstNew), function(i) { dat1
<- data.frame(lstNew[1], do.call(cbind, lstNew[i, ]), stringsAsFactors =
FALSE) colnames(dat1) <- c(rownames(lstNew)[1], paste(names(lst1),
rep(rownames(lstNew)[i],  length(lst1)), sep = "_")) write.csv(dat1,
paste0(paste(getwd(), "Indices", rownames(lstNew)[i], sep =
"/"),  ".csv"), row.names = FALSE, quote = FALSE)
}) ## Output2:
ReadOut2 <- lapply(list.files(recursive = TRUE)[grep("Indices",
list.files(recursive = TRUE))],  function(x) read.csv(x, header = TRUE,
stringsAsFactors = FALSE))
length(ReadOut2)
# [1] 257 


head(ReadOut2[[1]], 2)
#  Percentiles G100_pav_ANN G101_pav_ANN
#1          0%     1.054380     1.032740
#2          1%     1.069457     1.045689 


A.K.











On Sunday, April 13, 2014 2:46 AM, Zilefac Elvis <zilefacelvis at
yahoo.com> wrote:

Hi AK,
Q1)?I need your help again. Using the previous data (attached) and the previous
code below,instead of taking rowMeans, let's
do?quantile(x,seq(0,1,by=0.01)).?

Delete the last 2 rows (Trend and p<)?in each file before
doing?quantile(x,seq(0,1,by=0.01)).

For example, assume that I want to calculate?quantile(x,seq(0,1,by=0.01)) for
each column of Site G100. I will do so for the 5 sims of site G100 and then take
their average. This will be approximately close to the true value than just
calculating quantile(x,seq(0,1,by=0.01)) from one sim. Please do this same thing
for all the files.

So, when you do rowMeans, it should be the mean of quantile(x,seq(0,1,by=0.01))
calculated from all sims in that Site.

Output

The number of files in "final" remains the same (2 files). The
"Year" column(will be replaced)?will contain ?the names of
quantile(x,seq(0,1,by=0.01)) such as??0% ? ? ? ? ? 1% ? ? ? ? ? 2% ? ? ? ? ? 3%
? ? ? ? ? 4% ? ? ? ? ? 5% ? ? ? ? ? 6%, ..., 98% ? ? ? ? ?99% ? ? ? ? 100% . You
can give this column any name such as "Percentiles".


Q2) ?From the folder "final", please go to each file identified by
site name, take a column, say?col1 of txav??from each file, create a dataframe
whose colnames are site codes (names of files in "final"). Create a
folder called "Indices" and place this dataframe in it. The filename
for the dataframe is?txav, say. So, in "Indices", you will have one
file having 3 columns [, c(Percentiles, G100,G101)]. The idea is that I want to
be able to pick any column from files in "final" and form a dataframe
from which I will generate my qqplot or boxplot.

Thanks very much AK.
Atem
This should be the final step of this my drama, at least for now.
#=============================================================================================================
dir.create("final")
lst1 <-
split(list.files(pattern=".csv"),gsub("\\_.*","",list.files(pattern=".csv")))
lst2 <-? lapply(lst1,function(x1) lapply(x1, function(x2) {lines1 <-
readLines(x2); header1 <- lines1[1:2]; dat1 <-
read.table(text=lines1,header=FALSE,sep=",",stringsAsFactors=FALSE,
skip=2); colnames(dat1) <-
Reduce(paste,strsplit(header1,","));dat1}))

lstYear <- lapply(lst2,function(x) lapply(x, function(y)
y[,1,drop=FALSE])[[1]])?


lapply(seq_along(lst2),function(i) {lstN <-lapply(lst2[[i]],function(x)
x[,-1]); arr1 <-
array(unlist(lstN),dim=c(dim(lstN[[1]]),length(lstN)),dimnames=list(NULL,lapply(lstN,names)[[1]]));res
<- cbind(lstYear[[i]],rowMeans(arr1,dims=2,na.rm=TRUE)); names(res) <-
gsub("\\_$","",gsub(" ",
"_",names(res))); res[,1] <- gsub(" <",
"",res[,1]);
write.csv(res,paste0(paste(getwd(),"final",names(lst1)
[[i]],sep="/"),".csv"),row.names=FALSE,quote=FALSE)? })?



#====================================================================================================

Zilefac Elvis

2014-Apr-13 20:17 UTC

head link

[R] Quantile and rowMean from multiple files in a folder

Hi AK,
I must admit that you did an excellent job.
Thanks very much.
My analysis is manageable now.
Regards,
Atem.
On Sunday, April 13, 2014 8:54 AM, arun <smartpink111@yahoo.com> wrote:
 
Hi,

I am formatting the codes using library(formatR).  Hopefully, it will not be
mangled in the email.
dir.create("final")
lst1 <- split(list.files(pattern = ".csv"), gsub("\\_.*",
"", list.files(pattern = ".csv")))

lst2 <- lapply(lst1, function(x1) lapply(x1, function(x2) { lines1 <-
readLines(x2) header1 <- lines1[1:2] dat1 <- read.table(text = lines1,
header = FALSE, sep = ",", stringsAsFactors = FALSE,  skip = 2)
colnames(dat1) <- Reduce(paste, strsplit(header1, ","))
dat1[-c(nrow(dat1), nrow(dat1) - 1), ]
}))

library(plyr) 

lapply(seq_along(lst2), function(i) { lstN <- lapply(lst2[[i]], function(x)
x[, -1]) lstQ1 <- lapply(lstN, function(x) numcolwise(function(y) quantile(y,
seq(0, 1,  by = 0.01), na.rm = TRUE))(x)) arr1 <- array(unlist(lstQ1), dim =
c(dim(lstQ1[[1]]), length(lstQ1)), dimnames = list(NULL,  lapply(lstQ1,
names)[[1]])) res <- rowMeans(arr1, dims = 2, na.rm = TRUE) colnames(res)
<- gsub(" ", "_", colnames(res)) res1 <-
data.frame(Percentiles = paste0(seq(0, 100, by = 1), "%"), res,
stringsAsFactors = FALSE) write.csv(res1, paste0(paste(getwd(),
"final", paste(names(lst1)[[i]], "Quantile",  sep =
"_"), sep = "/"), ".csv"), row.names = FALSE,
quote = FALSE)
})

ReadOut1 <- lapply(list.files(recursive = TRUE)[grep("Quantile",
list.files(recursive = TRUE))],  function(x) read.csv(x, header = TRUE,
stringsAsFactors = FALSE))
sapply(ReadOut1, dim)
#     [,1] [,2]
#[1,]  101  101
#[2,]  258  258

lapply(ReadOut1,function(x) x[1:2,1:3])
#[[1]]
#  Percentiles  txav_DJF txav_MAM
#1          0% -12.68566  7.09702
#2          1% -12.59062  7.15338
#
#[[2]]
#  Percentiles  txav_DJF txav_MAM
#1          0% -12.75516 6.841840
#2          1% -12.68244 6.910664 


###Q2:

dir.create("Indices")
names1 <- lapply(ReadOut1, function(x) names(x))[[1]]
lstNew <- simplify2array(ReadOut1) lapply(2:nrow(lstNew), function(i) { dat1
<- data.frame(lstNew[1], do.call(cbind, lstNew[i, ]), stringsAsFactors =
FALSE) colnames(dat1) <- c(rownames(lstNew)[1], paste(names(lst1),
rep(rownames(lstNew)[i],  length(lst1)), sep = "_")) write.csv(dat1,
paste0(paste(getwd(), "Indices", rownames(lstNew)[i], sep =
"/"),  ".csv"), row.names = FALSE, quote = FALSE)
}) ## Output2:
ReadOut2 <- lapply(list.files(recursive = TRUE)[grep("Indices",
list.files(recursive = TRUE))],  function(x) read.csv(x, header = TRUE,
stringsAsFactors = FALSE))
length(ReadOut2)
# [1] 257 


head(ReadOut2[[1]], 2)
#  Percentiles G100_pav_ANN G101_pav_ANN
#1          0%     1.054380     1.032740
#2          1%     1.069457     1.045689 


A.K.












On Sunday, April 13, 2014 2:46 AM, Zilefac Elvis <zilefacelvis@yahoo.com>
wrote:

Hi AK,
Q1) I need your help again. Using the previous data (attached) and the previous
code below,instead of taking rowMeans, let's
do quantile(x,seq(0,1,by=0.01)). 

Delete the last 2 rows (Trend and p<) in each file before
doing quantile(x,seq(0,1,by=0.01)).

For example, assume that I want to calculate quantile(x,seq(0,1,by=0.01)) for
each column of Site G100. I will do so for the 5 sims of site G100 and then take
their average. This will be approximately close to the true value than just
calculating quantile(x,seq(0,1,by=0.01)) from one sim. Please do this same thing
for all the files.

So, when you do rowMeans, it should be the mean of quantile(x,seq(0,1,by=0.01))
calculated from all sims in that Site.

Output

The number of files in "final" remains the same (2 files). The
"Year" column(will be replaced) will contain  the names of
quantile(x,seq(0,1,by=0.01)) such as  0%           1%           2%           3%
          4%           5%           6%, ..., 98%          99%         100% . You
can give this column any name such as "Percentiles".


Q2)  From the folder "final", please go to each file identified by
site name, take a column, say col1 of txav  from each file, create a dataframe
whose colnames are site codes (names of files in "final"). Create a
folder called "Indices" and place this dataframe in it. The filename
for the dataframe is txav, say. So, in "Indices", you will have one
file having 3 columns [, c(Percentiles, G100,G101)]. The idea is that I want to
be able to pick any column from files in "final" and form a dataframe
from which I will generate my qqplot or boxplot.

Thanks very much AK.
Atem
This should be the final step of this my drama, at least for now.
#=============================================================================================================
dir.create("final")
lst1 <-
split(list.files(pattern=".csv"),gsub("\\_.*","",list.files(pattern=".csv")))
lst2 <-  lapply(lst1,function(x1) lapply(x1, function(x2) {lines1 <-
readLines(x2); header1 <- lines1[1:2]; dat1 <-
read.table(text=lines1,header=FALSE,sep=",",stringsAsFactors=FALSE,
skip=2); colnames(dat1) <-
Reduce(paste,strsplit(header1,","));dat1}))

lstYear <- lapply(lst2,function(x) lapply(x, function(y)
y[,1,drop=FALSE])[[1]]) 


lapply(seq_along(lst2),function(i) {lstN <-lapply(lst2[[i]],function(x)
x[,-1]); arr1 <-
array(unlist(lstN),dim=c(dim(lstN[[1]]),length(lstN)),dimnames=list(NULL,lapply(lstN,names)[[1]]));res
<- cbind(lstYear[[i]],rowMeans(arr1,dims=2,na.rm=TRUE)); names(res) <-
gsub("\\_$","",gsub(" ",
"_",names(res))); res[,1] <- gsub(" <",
"",res[,1]);
write.csv(res,paste0(paste(getwd(),"final",names(lst1)
[[i]],sep="/"),".csv"),row.names=FALSE,quote=FALSE)  }) 



#===================================================================================================
[[alternative HTML version deleted]]

R help - Apr 2014 - Quantile and rowMean from multiple files in a folder

[R] Quantile and rowMean from multiple files in a folder

[R] Quantile and rowMean from multiple files in a folder