Hi Atem,
Try this:
I created 3 folders (Precip, Tmax, Tmin) within the folder "sample"
#working directory: sample
list.files()
#[1] "Imputation_Daily_Sim01.dat"
"Imputation_Daily_Sim02.dat"
#[3] "Imputation_Daily_Sim03.dat" "Precip"
#[5] "Sim1971-2000_Daily_Sim001.dat"
"Sim1971-2000_Daily_Sim002.dat"
#[7] "Sim1971-2000_Daily_Sim003.dat" "Tmax"
#[9] "Tmin"
list.files(pattern="Sim1971-2000")
#[1] "Sim1971-2000_Daily_Sim001.dat"
"Sim1971-2000_Daily_Sim002.dat"
#[3] "Sim1971-2000_Daily_Sim003.dat"
lst1 <- lapply(list.files(pattern="Sim1971-2000"),function(x)
readLines(x))
lst1Not1970 <- lapply(lst1,function(x) x[!grepl("1970",x)])
#Using a small subset:
lst1Sub <- lapply(lst1Not1970,function(x) x[1:1000])
#replace lst1Sub with lst1Not1970 below
lst2 <- lapply(lst1Sub,function(x) {dateSite <-
gsub("(.*G\\d+).*","\\1",x); dat1 <-
data.frame(Year=as.numeric(substr(dateSite,1,4)),Month=as.numeric(substr(dateSite,5,6)),Day=as.numeric(substr(dateSite,7,8)),Site=substr(dateSite,9,12),stringsAsFactors=FALSE);Sims
<- gsub(".*G\\d+\\s+(.*)","\\1",x);
Sims[grep("\\d+-",Sims)] <- gsub("(.*)([-
][0-9]+\\.[0-9]+)","\\1
\\2",gsub("^([0-9]+\\.[0-9]+)(.*)","\\1 \\2",
Sims[grep("\\d+-",Sims)])); Sims1 <-
read.table(text=Sims,header=FALSE); names(Sims1) <-
c("Precipitation", "Tmin", "Tmax");dat2 <-
cbind(dat1,Sims1)})
Precip <- lapply(lst2,function(x) x[,1:5])
Tmin <- lapply(lst2,function(x) x[,c(1:4,6)])
Tmax <- lapply(lst2,function(x) x[,c(1:4,7)])
Precip1 <- cbind(Precip[[1]][,1:4],do.call(cbind,lapply(Precip,`[`,5)))
names(Precip1)[5:ncol(Precip1)] <-
paste0("Sim",sprintf("%03d",1:length(Precip)))
lapply(split(Precip1,Precip1$Site),function(x)
write.table(x,file=paste(getwd(),"Precip",paste0(unique(x$Site),".dat"),sep="/"),row.names=FALSE,quote=FALSE))
Tmin1 <- cbind(Tmin[[1]][,1:4],do.call(cbind,lapply(Tmin,`[`,5)))
names(Tmin1) <- names(Precip1)
lapply(split(Tmin1,Tmin1$Site),function(x)
write.table(x,file=paste(getwd(),"Tmin",paste0(unique(x$Site),".dat"),sep="/"),row.names=FALSE,quote=FALSE))
Tmax1 <- cbind(Tmax[[1]][,1:4],do.call(cbind,lapply(Tmax,`[`,5)))
names(Tmax1) <- names(Precip1)
lapply(split(Tmax1,Tmax1$Site),function(x)
write.table(x,file=paste(getwd(),"Tmax",paste0(unique(x$Site),".dat"),sep="/"),row.names=FALSE,quote=FALSE))
Hope this helps.
A.K.
On Friday, March 28, 2014 2:07 AM, Zilefac Elvis <zilefacelvis@yahoo.com>
wrote:
Hi AK,
Consider that you had to use the large file which could not download.
My final output will be as follows:
Three folders:
1) Precip
2) Tmin or minimum temperature
3) Tmax or maximum temperature
Within each folder, we will have 120 files. Each file is named by the site code
e.g GGG1, GGG2 ,..., G120.
Each file will be a dataframe with the first 3 columns as date (Year,Month,Day).
Years are from 1971-2000. For the large file, after the date columns are
simulation numbers e.g Year,Month,Day,sim001,sim002...sim100. For the sample
file, it would be Year,Month,Day,sim001,sim002,sim003.
Thanks again.
Atem.
On Thursday, March 27, 2014 11:55 PM, Zilefac Elvis
<zilefacelvis@yahoo.com> wrote:
Hi AK,
Attached is a sample from the large file. The expected output is explained at
the end of this message (bold).
It is a little lengthy but is worth it given that the number of sites is
plentiful. I have attached three simulations, so your will have sim1,sim2,sim3
instead of sim1 to sim100 as in the previous message.
############################################################################
I have done some simulations in R and would like to order my data to usable
format.
The data is to large so I have attached via Dropbox.
When you load Calibration.RData to the workspace, you will find the site codes
(column 1) in "Prairies.Sites".
My initial dataset was in the form of a dataframe with with columns denoting
stations. So I had three dataframes each for precipitation, Tmin, and Tmax.
Individually, you reshaped the dataframes to three column vectors (see file
called PrecipTminTmax) using this code: library(reshape2)
dat1 <-
read.table("predictand.csv",header=TRUE,stringsAsFactors=TRUE,sep="\t")
# Predictand.csv had 123 #columns with the columns 1,2,3 as date.
dat1<-precipitation
dat2M <-
melt(dat1,id.var=c("year","month","day"))
dat2M1 <- dat2M[with(dat2M,order(year,month,day,variable)),]
dim(dat2M1)
#[1] 1972320 5
row.names(dat2M1) <- 1:nrow(dat2M1)
PrecipTminTmax<-cbind(precipitation,Tmin,Tmax) The problem to be solved
Attached is a large file (SimCalibration.zip) containing my simulations (001 to
100). Please import files starting with "Sim1971-2000_Daily_" only.
The rest is not important. My analysis is for the period 1971-2000. Any data
before or after this period should be ignored.
My simulation was done in R using Fortran encoding to read data values. All
files are ".dat". In each file, the columns are as follows :
Year, Month, Day, Site, Precip, Tmin, Tmax. In another project involving
rainfall only, I read such files into R using this code:
rain.data <-
scan("gaugvals.all",what=character(),sep="\n",n=257212)
rain.data <- data.frame(Year=as.numeric(substr(rain.data,1,4)),
Month=as.numeric(substr(rain.data,5,6)),
Day=as.numeric(substr(rain.data,7,8)),
Site=substr(rain.data,10,12),
Rain=as.numeric(substr(rain.data,13,18)))
Q1) So, I would like to read all files beginning with
"Sim1971-2000_Daily_".
2) Split each file by variable name (Precip, Tmin, Tmax) and then arrange each
variable in the form of a dataframe. For example, I will take precip from site
GGG1 and have a data frame with colnames such as Year,Month,Day,
sim1,sim2,...,sim100. Repeat this for all 120 sites. So that for Precip, you
will have 120 files corresponding to the site codes. Each file has nrows with
Year,Month,Day, sim1...sim100 columns. 3) Please repeat the above for Tmin and
Tmax so that in the end I will have three folders (Precip, Tmin and Tmax). Each
folder has 120 files with each file being a dataframe containing date and 100
columns). When you successfullly go through this "difficult"
section,I will access each folder, read each file and apply a function to it one
at a time. Thanks AK, this is part of my Msc thesis project. Your help would be
fully acknowledged. You have helped me a lot towards the success of this
project. Atem.
On Thursday, March 27, 2014 9:09 PM, arun <smartpink111@yahoo.com> wrote:
HI Atem,
I tried to download the first file.
It is taking me forever. With the speed I have, I doubt it would be
successful. Can you just provide some small reproducible example data and what
your expected output would be?
Arun