thr3ads.net - R help - [R] : Quantile and rowMean from multiple files in a folder [Apr 2014]

If this information is useful, please help other people find it:
Share via:
Zilefac Elvis
2014-Apr-15 06:21 UTC
[R] : Quantile and rowMean from multiple files in a folder

Hi AK,
All codes for simulation files work great.
I will try the code for observations and let you know.
Thanks very much.
Atem.







On Tuesday, April 15, 2014 12:01 AM, arun <smartpink111 at yahoo.com>
wrote:
Yes,
my new solution ignores such cases.







On Monday, April 14, 2014 11:58 PM, Zilefac Elvis <zilefacelvis at
yahoo.com> wrote:
Hi AK,
Please ignore any such site.
I will check it and include in the analysis.
Thanks,
Atem.



On Monday, April 14, 2014 9:34 PM, arun <smartpink111 at yahoo.com> wrote:



Hi,

I looked at your Observed.zip.? In that one of the file is without any data:
GG83_Sim.csv.ind.csv
The contents of the file are just:

Year??? 
Year??? 
trend??? 
p??? < 
?

A.K.


On Monday, April 14, 2014 10:41 PM, Zilefac Elvis <zilefacelvis at
yahoo.com> wrote:
Hi AK,
Q1) Please try to correct the error using the larger data set (Sample.zip). The
issue is that once you write the codes and restrict it to smaller data sets, I
find it difficult to generalize it to larger data sets.

Q2) From the Quantilecode2.txt you just sent, you forgot to do the following
section using the Observed.zip file. I tried to run the code to section Q1 in
Quantilecode2.txt using a larger data set and received the same error :Error in
2:nrow(lstNew) : argument of length 0. I have attached a larger data set too for
you to generalize the code to suit the larger data set. Please do not forget to
include the code below in the final code of Q2.


Once you fix these two, I should be able to fix the rest following these
examples.

Thanks AK. Sorry for overloading you with much work.
Atem.

#=============================================================================================================dir.create("Indices")?
names1 <- lapply(ReadOut1, function(x) names(x))[[1]]
lstNew <- simplify2array(ReadOut1) lapply(2:nrow(lstNew), function(i) { dat1
<- data.frame(lstNew[1], do.call(cbind, lstNew[i, ]), stringsAsFactors =
FALSE) colnames(dat1) <- c(rownames(lstNew)[1], paste(names(lst1),
rep(rownames(lstNew)[i],? length(lst1)), sep = "_"))?
write.csv(dat1, paste0(paste(getwd(), "Indices", rownames(lstNew)[i],
sep = "/"),? ".csv"), row.names = FALSE, quote = FALSE)
})? 
## Output2:
ReadOut2 <- lapply(list.files(recursive = TRUE)[grep("Indices",
list.files(recursive = TRUE))],? function(x) read.csv(x, header = TRUE,
stringsAsFactors = FALSE))
length(ReadOut2)
# [1] 257
head(ReadOut2[[1]], 2) 

#=============================================================================================================



On Monday, April 14, 2014 8:07 PM, arun <smartpink111 at yahoo.com> wrote:

HI,

Please send your emails in plain text.? If you had looked at the dimensions of
`lst2`:
sapply(lst2,function(x) sapply(x,ncol))[1:6,]
? ???G100 G101 G102 G103 G104 G105 G106 G107 G108 G109 G110 G111 G112 G113 G114
[1,]? 258? 258? 258? 258? 258? 257? 258? 258? 258? 258? 258? 258? 258? 258? 247
[2,]? 258? 258? 258? 258? 258? 258? 258? 258? 258? 258? 258? 258? 258? 258? 258
[3,]? 258? 258? 258? 258? 258? 258? 258? 258? 258? 258? 258? 258? 258? 258? 257
[4,]? 258? 258? 258? 258? 258? 257? 258? 258? 258? 258? 258? 258? 258? 258? 258
[5,]? 258? 258? 258? 258? 258? 258? 258? 258? 258? 258? 258? 258? 258? 258? 258
[6,]? 258? 258? 258? 258? 258? 258? 258? 258? 258? 258? 258? 258? 258? 258? 258
? ???G115 G116 G117 G118 G119 G120 GG10 GG11 GG12 GG13 GG14 GG15 GG16 GG17 GG18
[1,]? 258? 247? 256? 256? 258? 258? 258? 258? 258? 258? 258? 258? 258? 257? 258
[2,]? 258? 250? 257? 258? 258? 256? 258? 258? 258? 258? 258? 258? 258? 258? 258
[3,]? 258? 247? 256? 258? 258? 256? 258? 258? 258? 258? 258? 258? 258? 258? 256
[4,]? 258? 258? 258? 257? 258? 258? 258? 258? 258? 258? 258? 258? 258? 257? 258
[5,]? 258? 257? 258? 258? 258? 256? 258? 258? 258? 258? 258? 258? 258? 258? 258
[6,]? 258? 257? 249? 257? 258? 258? 258? 258? 258? 258? 258? 258? 258? 258? 258
? ???GG19 GG20 GG21 GG22 GG23 GG24 GG25 GG26 GG27 GG28
[1,]? 258? 258? 258? 258? 258? 258? 258? 258? 258? 258
[2,]? 258? 258? 258? 258? 258? 258? 258? 258? 258? 258
[3,]? 258? 258? 257? 258? 256? 257? 258? 258? 258? 258
[4,]? 258? 257? 258? 258? 258? 257? 258? 258? 258? 258
[5,]? 258? 258? 257? 258? 257? 258? 258? 258? 258? 258
[6,]? 258? 258? 258? 258? 257? 258? 258? 258? 258? 258 


#the dimensions are not consistent for the Simulations
within each Site.? My codes assumed that all the datasets were having the same
number of columns, rows etc.






On Monday, April 14, 2014 6:26 PM, Zilefac Elvis <zilefacelvis at
yahoo.com> wrote:

Hi AK,
I have another request for help.
Attached is a larger file (~27MB) for sample.zip. All files are same as previous
except that I am using more sites to do the same thing that you did with
sample.zip.

When generalizing Quantilecode.R to many sites, I receive an error when I run:

dir.create("Indices")
names1 <- lapply(ReadOut1, function(x) names(x))[[1]]
lstNew <- simplify2array(ReadOut1)

lapply(2:nrow(lstNew), function(i) {
? dat1 <- data.frame(lstNew[1], do.call(cbind, lstNew[i, ]), stringsAsFactors
= FALSE)
? colnames(dat1) <- c(rownames(lstNew)[1], paste(names(lst1),
rep(rownames(lstNew)[i],?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? length(lst1)),
sep = "_"))
? write.csv(dat1, paste0(paste(getwd(), "Indices",
rownames(lstNew)[i], sep = "/"),?
? ? ? ? ? ? ? ? ? ? ? ? ?".csv"), row.names = FALSE, quote = FALSE)
})

and I get this:
Error in 2:nrow(lstNew) : argument of length 0


I have tried a few tricks but could not overcome the error message.

Please help!
Atem.

sample (1).zip
Zilefac Elvis shared from Dropbox??
View on www.dropbox.com Preview by Yahoo??

On Monday, April 14, 2014 9:22 AM, arun <smartpink111 at yahoo.com> wrote:

Ok
. I got the results but mynet is down. Will send once it gets fixed

----------
Sent from my Nokia

------Original message------
From: zilefacelvis at yahoo.com <zilefacelvis at yahoo.com>
To: "arun" <smartpink111 at yahoo.com>
Date: Monday, April 14, 2014 3:01:38 PM GMT
Subject: Re: Re: Quantile and rowMean from multiple files in a folder





In the Observed.zip I
have just one file per site while in sample.zip I have 100 files(Sims) per site.





Thanks,


Atem.

------ Original Message ------



From : arun
To : Zilefac Elvis;
Sent : 14-04-2014 00:12
Subject : Re: Quantile and rowMean from multiple files in a folder
One more doubt, do you have more than one files per Site??In the example, it was
just one file per Site.? ? ? On Monday, April 14, 2014 2:08 AM, arun? wrote: Hi,
The problem is in the different dimensions of the Observed datasets.?
sapply(seq_along(lst2),function(i){lstN<- lapply(lst2[[i]],function(x)
x[,-1]);sapply(lstN,function(x) nrow(x))}) ##after removing the trend and P
value rows #[1] 9 9 9 8 2 9? ? If you want to take the average, is it through
filling NAs for those years that are missing in the files?? A.K.? ? ???On
Monday, April 14, 2014 1:05 AM, Zilefac Elvis?
wrote:? Hi AK,? Q1) Please apply the Quantilecode.R to Observed.zip (attached).
I tried but received an error which was self-explanatory but I could not change
the dimensions in the code.???Q2) Please apply Quantilecode.R to both sample.zip
and observed.zip. Here, instead of doing quantile(y, seq(0, 1, by = 0.01), take
colMeans of the indices.? ? I have tried to solve both Q1 and Q2 but still
unable to control the dimensions.? Thanks, Atem. On Sunday, April 13, 2014 9:05
AM, arun? wrote:? ? Hi Atem,? On my end, the codes are not formatted in the
email as seen in the screen of formatR GUI.? I am attaching the .R file in case
there is some difficulty for you. Arun? ? On Sunday, April 13, 2014 10:54 AM,
arun? wrote: Hi,? I am formatting the codes using library(formatR).?Hopefully,
it will not be mangled in the email. dir.create("final") lst1<-
split(list.files(pattern
=".csv"), gsub("\\_.*","", list.files(pattern
=".csv")))? lst2<- lapply(lst1, function(x1) lapply(x1,
function(x2) { lines1<- readLines(x2) header1<- lines1[1:2] dat1<-
read.table(text = lines1, header = FALSE, sep =",", stringsAsFactors =
FALSE,?skip = 2) colnames(dat1)<- Reduce(paste,
strsplit(header1,",")) dat1[-c(nrow(dat1), nrow(dat1) - 1), ] }))?
library(plyr)???lapply(seq_along(lst2), function(i) { lstN<-
lapply(lst2[[i]], function(x) x[, -1]) lstQ1<- lapply(lstN, function(x)
numcolwise(function(y) quantile(y, seq(0, 1,?by = 0.01), na.rm = TRUE))(x))
arr1<- array(unlist(lstQ1), dim = c(dim(lstQ1[[1]]), length(lstQ1)), dimnames
= list(NULL,?lapply(lstQ1, names)[[1]])) res<- rowMeans(arr1, dims = 2, na.rm
= TRUE) colnames(res)<- gsub("","_", colnames(res))
res1<- data.frame(Percentiles = paste0(seq(0, 100, by = 1),"%"),
res, stringsAsFactors = FALSE) write.csv(res1,
paste0(paste(getwd(),"final",
paste(names(lst1)[[i]],"Quantile",?sep ="_"), sep
="/"),".csv"), row.names = FALSE, quote = FALSE) })?
ReadOut1<- lapply(list.files(recursive = TRUE)[grep("Quantile",
list.files(recursive = TRUE))],?function(x) read.csv(x, header = TRUE,
stringsAsFactors = FALSE)) sapply(ReadOut1, dim) #????[,1] [,2] #[1,]?101?101
#[2,]?258?258? lapply(ReadOut1,function(x) x[1:2,1:3]) #[[1]]
#?Percentiles?txav_DJF txav_MAM #1? ? ? ? ?0% -12.68566?7.09702 #2? ? ? ? ?1%
-12.59062?7.15338 # #[[2]] #?Percentiles?txav_DJF txav_MAM #1? ? ? ? ?0%
-12.75516 6.841840 #2? ? ? ? ?1% -12.68244 6.910664? ? ###Q2:?
dir.create("Indices") names1<- lapply(ReadOut1, function(x)
names(x))[[1]] lstNew<- simplify2array(ReadOut1) lapply(2:nrow(lstNew),
function(i) { dat1<- data.frame(lstNew[1], do.call(cbind, lstNew[i, ]),
stringsAsFactors = FALSE) colnames(dat1)<- c(rownames(lstNew)[1],
paste(names(lst1), rep(rownames(lstNew)[i],?length(lst1)), sep ="_"))
write.csv(dat1, paste0(paste(getwd(),"Indices", rownames(lstNew)[i],
sep ="/"), ".csv"), row.names = FALSE, quote = FALSE) }) ##
Output2: ReadOut2<- lapply(list.files(recursive =
TRUE)[grep("Indices", list.files(recursive = TRUE))],?function(x)
read.csv(x, header = TRUE, stringsAsFactors = FALSE)) length(ReadOut2) # [1]
257? ? head(ReadOut2[[1]], 2) #?Percentiles G100_pav_ANN G101_pav_ANN #1? ? ? ?
?0%????1.054380????1.032740 #2? ? ? ? ?1% ?1.069457????1.045689? ? A.K.? ? ? ? ?
???On Sunday, April 13, 2014 2:46 AM, Zilefac Elvis? wrote:? Hi AK, Q1) I need
your help again. Using the
previous data (attached) and the previous code below,instead of taking rowMeans,
let's do quantile(x,seq(0,1,by=0.01)).???Delete the last 2 rows (Trend and
p<) in each file before doing quantile(x,seq(0,1,by=0.01)).? For example,
assume that I want to calculate quantile(x,seq(0,1,by=0.01)) for each column of
Site G100. I will do so for the 5 sims of site G100 and then take their average.
This will be approximately close to the true value than just calculating
quantile(x,seq(0,1,by=0.01)) from one sim. Please dothis same thing for all the
files.? So, when you do rowMeans, it should be the mean of
quantile(x,seq(0,1,by=0.01)) calculated from all sims in that Site.? Output? The
number of files in"final" remains the same (2 files).
The"Year" column(will be replaced) will contain?the names of
quantile(x,seq(0,1,by=0.01)) such as?0%? ? ? ????1%? ? ? ????2%? ?
? ????3%? ? ? ????4%? ? ? ????5%? ? ? ????6%, ..., 98%? ? ? ?99%? ? ????100% .
You can give this column any name such as"Percentiles".???Q2)?From the
folder"final", please go to each file identified by site name, take a
column, say col1 of txav?from each file, create a dataframe whose colnames are
site codes (names of files in"final"). Create a folder
called"Indices" and place this dataframe in it. The filename for the
dataframe is txav, say. So, in"Indices", you will have one file having
3 columns [, c(Percentiles, G100,G101)]. The idea is that I want to be able to
pick any column from files in"final" and form a dataframe from which I
will generate my qqplot or boxplot.? Thanks very much AK. Atem This should be
the final step of this my drama, at least for now.
#==============================================================================================================?
dir.create("final") lst1<-
split(list.files(pattern=".csv"),gsub("\\_.*","",list.files(pattern=".csv")))
lst2<-?lapply(lst1,function(x1) lapply(x1, function(x2) {lines1<-
readLines(x2); header1<- lines1[1:2]; dat1<-
read.table(text=lines1,header=FALSE,sep=",",stringsAsFactors=FALSE,
skip=2); colnames(dat1)<-
Reduce(paste,strsplit(header1,","));dat1}))? lstYear<-
lapply(lst2,function(x) lapply(x, function(y) y[,1,drop=FALSE])[[1]])? ?
lapply(seq_along(lst2),function(i) {lstN<-lapply(lst2[[i]],function(x)
x[,-1]); arr1<-
array(unlist(lstN),dim=c(dim(lstN[[1]]),length(lstN)),dimnames=list(NULL,lapply(lstN,names)[[1]]));res<-
cbind(lstYear[[i]],rowMeans(arr1,dims=2,na.rm=TRUE)); names(res)<-
gsub("\\_$","",gsub("","_",names(res)));
res[,1]<- gsub("<","",res[,1]);
write.csv(res,paste0(paste(getwd(),"final",names(lst1)
[[i]],sep="/"),".csv"),row.names=FALSE,quote=FALSE)?})?
???#====================================================================================================
R help - Apr 2014 - : Quantile and rowMean from multiple files in a folder

[R] : Quantile and rowMean from multiple files in a folder