Hi, I am formatting the codes using library(formatR).? Hopefully, it will not be mangled in the email. dir.create("final") lst1 <- split(list.files(pattern = ".csv"), gsub("\\_.*", "", list.files(pattern = ".csv"))) lst2 <- lapply(lst1, function(x1) lapply(x1, function(x2) { lines1 <- readLines(x2) header1 <- lines1[1:2] dat1 <- read.table(text = lines1, header = FALSE, sep = ",", stringsAsFactors = FALSE, skip = 2) colnames(dat1) <- Reduce(paste, strsplit(header1, ",")) dat1[-c(nrow(dat1), nrow(dat1) - 1), ] })) library(plyr) lapply(seq_along(lst2), function(i) { lstN <- lapply(lst2[[i]], function(x) x[, -1]) lstQ1 <- lapply(lstN, function(x) numcolwise(function(y) quantile(y, seq(0, 1, by = 0.01), na.rm = TRUE))(x)) arr1 <- array(unlist(lstQ1), dim = c(dim(lstQ1[[1]]), length(lstQ1)), dimnames = list(NULL, lapply(lstQ1, names)[[1]])) res <- rowMeans(arr1, dims = 2, na.rm = TRUE) colnames(res) <- gsub(" ", "_", colnames(res)) res1 <- data.frame(Percentiles = paste0(seq(0, 100, by = 1), "%"), res, stringsAsFactors = FALSE) write.csv(res1, paste0(paste(getwd(), "final", paste(names(lst1)[[i]], "Quantile", sep = "_"), sep = "/"), ".csv"), row.names = FALSE, quote = FALSE) }) ReadOut1 <- lapply(list.files(recursive = TRUE)[grep("Quantile", list.files(recursive = TRUE))], function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE)) sapply(ReadOut1, dim) # [,1] [,2] #[1,] 101 101 #[2,] 258 258 lapply(ReadOut1,function(x) x[1:2,1:3]) #[[1]] # Percentiles txav_DJF txav_MAM #1 0% -12.68566 7.09702 #2 1% -12.59062 7.15338 # #[[2]] # Percentiles txav_DJF txav_MAM #1 0% -12.75516 6.841840 #2 1% -12.68244 6.910664? ###Q2: dir.create("Indices") names1 <- lapply(ReadOut1, function(x) names(x))[[1]] lstNew <- simplify2array(ReadOut1) lapply(2:nrow(lstNew), function(i) { dat1 <- data.frame(lstNew[1], do.call(cbind, lstNew[i, ]), stringsAsFactors = FALSE) colnames(dat1) <- c(rownames(lstNew)[1], paste(names(lst1), rep(rownames(lstNew)[i], length(lst1)), sep = "_")) write.csv(dat1, paste0(paste(getwd(), "Indices", rownames(lstNew)[i], sep = "/"), ".csv"), row.names = FALSE, quote = FALSE) }) ## Output2: ReadOut2 <- lapply(list.files(recursive = TRUE)[grep("Indices", list.files(recursive = TRUE))], function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE)) length(ReadOut2) # [1] 257 head(ReadOut2[[1]], 2) # Percentiles G100_pav_ANN G101_pav_ANN #1 0% 1.054380 1.032740 #2 1% 1.069457 1.045689 A.K. On Sunday, April 13, 2014 2:46 AM, Zilefac Elvis <zilefacelvis at yahoo.com> wrote: Hi AK, Q1)?I need your help again. Using the previous data (attached) and the previous code below,instead of taking rowMeans, let's do?quantile(x,seq(0,1,by=0.01)).? Delete the last 2 rows (Trend and p<)?in each file before doing?quantile(x,seq(0,1,by=0.01)). For example, assume that I want to calculate?quantile(x,seq(0,1,by=0.01)) for each column of Site G100. I will do so for the 5 sims of site G100 and then take their average. This will be approximately close to the true value than just calculating quantile(x,seq(0,1,by=0.01)) from one sim. Please do this same thing for all the files. So, when you do rowMeans, it should be the mean of quantile(x,seq(0,1,by=0.01)) calculated from all sims in that Site. Output The number of files in "final" remains the same (2 files). The "Year" column(will be replaced)?will contain ?the names of quantile(x,seq(0,1,by=0.01)) such as??0% ? ? ? ? ? 1% ? ? ? ? ? 2% ? ? ? ? ? 3% ? ? ? ? ? 4% ? ? ? ? ? 5% ? ? ? ? ? 6%, ..., 98% ? ? ? ? ?99% ? ? ? ? 100% . You can give this column any name such as "Percentiles". Q2) ?From the folder "final", please go to each file identified by site name, take a column, say?col1 of txav??from each file, create a dataframe whose colnames are site codes (names of files in "final"). Create a folder called "Indices" and place this dataframe in it. The filename for the dataframe is?txav, say. So, in "Indices", you will have one file having 3 columns [, c(Percentiles, G100,G101)]. The idea is that I want to be able to pick any column from files in "final" and form a dataframe from which I will generate my qqplot or boxplot. Thanks very much AK. Atem This should be the final step of this my drama, at least for now. #============================================================================================================= dir.create("final") lst1 <- split(list.files(pattern=".csv"),gsub("\\_.*","",list.files(pattern=".csv"))) lst2 <-? lapply(lst1,function(x1) lapply(x1, function(x2) {lines1 <- readLines(x2); header1 <- lines1[1:2]; dat1 <- read.table(text=lines1,header=FALSE,sep=",",stringsAsFactors=FALSE, skip=2); colnames(dat1) <- Reduce(paste,strsplit(header1,","));dat1})) lstYear <- lapply(lst2,function(x) lapply(x, function(y) y[,1,drop=FALSE])[[1]])? lapply(seq_along(lst2),function(i) {lstN <-lapply(lst2[[i]],function(x) x[,-1]); arr1 <- array(unlist(lstN),dim=c(dim(lstN[[1]]),length(lstN)),dimnames=list(NULL,lapply(lstN,names)[[1]]));res <- cbind(lstYear[[i]],rowMeans(arr1,dims=2,na.rm=TRUE)); names(res) <- gsub("\\_$","",gsub(" ", "_",names(res))); res[,1] <- gsub(" <", "",res[,1]); write.csv(res,paste0(paste(getwd(),"final",names(lst1) [[i]],sep="/"),".csv"),row.names=FALSE,quote=FALSE)? })? #====================================================================================================
Zilefac Elvis
2014-Apr-13 20:17 UTC
[R] Quantile and rowMean from multiple files in a folder
Hi AK, I must admit that you did an excellent job. Thanks very much. My analysis is manageable now. Regards, Atem. On Sunday, April 13, 2014 8:54 AM, arun <smartpink111@yahoo.com> wrote: Hi, I am formatting the codes using library(formatR). Hopefully, it will not be mangled in the email. dir.create("final") lst1 <- split(list.files(pattern = ".csv"), gsub("\\_.*", "", list.files(pattern = ".csv"))) lst2 <- lapply(lst1, function(x1) lapply(x1, function(x2) { lines1 <- readLines(x2) header1 <- lines1[1:2] dat1 <- read.table(text = lines1, header = FALSE, sep = ",", stringsAsFactors = FALSE, skip = 2) colnames(dat1) <- Reduce(paste, strsplit(header1, ",")) dat1[-c(nrow(dat1), nrow(dat1) - 1), ] })) library(plyr) lapply(seq_along(lst2), function(i) { lstN <- lapply(lst2[[i]], function(x) x[, -1]) lstQ1 <- lapply(lstN, function(x) numcolwise(function(y) quantile(y, seq(0, 1, by = 0.01), na.rm = TRUE))(x)) arr1 <- array(unlist(lstQ1), dim = c(dim(lstQ1[[1]]), length(lstQ1)), dimnames = list(NULL, lapply(lstQ1, names)[[1]])) res <- rowMeans(arr1, dims = 2, na.rm = TRUE) colnames(res) <- gsub(" ", "_", colnames(res)) res1 <- data.frame(Percentiles = paste0(seq(0, 100, by = 1), "%"), res, stringsAsFactors = FALSE) write.csv(res1, paste0(paste(getwd(), "final", paste(names(lst1)[[i]], "Quantile", sep = "_"), sep = "/"), ".csv"), row.names = FALSE, quote = FALSE) }) ReadOut1 <- lapply(list.files(recursive = TRUE)[grep("Quantile", list.files(recursive = TRUE))], function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE)) sapply(ReadOut1, dim) # [,1] [,2] #[1,] 101 101 #[2,] 258 258 lapply(ReadOut1,function(x) x[1:2,1:3]) #[[1]] # Percentiles txav_DJF txav_MAM #1 0% -12.68566 7.09702 #2 1% -12.59062 7.15338 # #[[2]] # Percentiles txav_DJF txav_MAM #1 0% -12.75516 6.841840 #2 1% -12.68244 6.910664 ###Q2: dir.create("Indices") names1 <- lapply(ReadOut1, function(x) names(x))[[1]] lstNew <- simplify2array(ReadOut1) lapply(2:nrow(lstNew), function(i) { dat1 <- data.frame(lstNew[1], do.call(cbind, lstNew[i, ]), stringsAsFactors = FALSE) colnames(dat1) <- c(rownames(lstNew)[1], paste(names(lst1), rep(rownames(lstNew)[i], length(lst1)), sep = "_")) write.csv(dat1, paste0(paste(getwd(), "Indices", rownames(lstNew)[i], sep = "/"), ".csv"), row.names = FALSE, quote = FALSE) }) ## Output2: ReadOut2 <- lapply(list.files(recursive = TRUE)[grep("Indices", list.files(recursive = TRUE))], function(x) read.csv(x, header = TRUE, stringsAsFactors = FALSE)) length(ReadOut2) # [1] 257 head(ReadOut2[[1]], 2) # Percentiles G100_pav_ANN G101_pav_ANN #1 0% 1.054380 1.032740 #2 1% 1.069457 1.045689 A.K. On Sunday, April 13, 2014 2:46 AM, Zilefac Elvis <zilefacelvis@yahoo.com> wrote: Hi AK, Q1) I need your help again. Using the previous data (attached) and the previous code below,instead of taking rowMeans, let's do quantile(x,seq(0,1,by=0.01)). Delete the last 2 rows (Trend and p<) in each file before doing quantile(x,seq(0,1,by=0.01)). For example, assume that I want to calculate quantile(x,seq(0,1,by=0.01)) for each column of Site G100. I will do so for the 5 sims of site G100 and then take their average. This will be approximately close to the true value than just calculating quantile(x,seq(0,1,by=0.01)) from one sim. Please do this same thing for all the files. So, when you do rowMeans, it should be the mean of quantile(x,seq(0,1,by=0.01)) calculated from all sims in that Site. Output The number of files in "final" remains the same (2 files). The "Year" column(will be replaced) will contain the names of quantile(x,seq(0,1,by=0.01)) such as 0% 1% 2% 3% 4% 5% 6%, ..., 98% 99% 100% . You can give this column any name such as "Percentiles". Q2) From the folder "final", please go to each file identified by site name, take a column, say col1 of txav from each file, create a dataframe whose colnames are site codes (names of files in "final"). Create a folder called "Indices" and place this dataframe in it. The filename for the dataframe is txav, say. So, in "Indices", you will have one file having 3 columns [, c(Percentiles, G100,G101)]. The idea is that I want to be able to pick any column from files in "final" and form a dataframe from which I will generate my qqplot or boxplot. Thanks very much AK. Atem This should be the final step of this my drama, at least for now. #============================================================================================================= dir.create("final") lst1 <- split(list.files(pattern=".csv"),gsub("\\_.*","",list.files(pattern=".csv"))) lst2 <- lapply(lst1,function(x1) lapply(x1, function(x2) {lines1 <- readLines(x2); header1 <- lines1[1:2]; dat1 <- read.table(text=lines1,header=FALSE,sep=",",stringsAsFactors=FALSE, skip=2); colnames(dat1) <- Reduce(paste,strsplit(header1,","));dat1})) lstYear <- lapply(lst2,function(x) lapply(x, function(y) y[,1,drop=FALSE])[[1]]) lapply(seq_along(lst2),function(i) {lstN <-lapply(lst2[[i]],function(x) x[,-1]); arr1 <- array(unlist(lstN),dim=c(dim(lstN[[1]]),length(lstN)),dimnames=list(NULL,lapply(lstN,names)[[1]]));res <- cbind(lstYear[[i]],rowMeans(arr1,dims=2,na.rm=TRUE)); names(res) <- gsub("\\_$","",gsub(" ", "_",names(res))); res[,1] <- gsub(" <", "",res[,1]); write.csv(res,paste0(paste(getwd(),"final",names(lst1) [[i]],sep="/"),".csv"),row.names=FALSE,quote=FALSE) }) #=================================================================================================== [[alternative HTML version deleted]]