Hello Mr. Holtman,
Thank you very much for your reply and suggestion. This is what each Year's
data looks like;
tmp1 <- structure(list(FIPS = c(1001L, 1003L, 1005L), X2026.01.01.1 >
c(285.5533142,> 285.5533142, 286.2481079), X2026.01.01.2 = c(283.4977112, 283.4977112,
> 285.0860291), X2026.01.01.3 = c(281.9733887, 281.9733887, 284.1548767
> ), X2026.01.01.4 = c(280.0234985, 280.0234985, 282.6075745),
> X2026.01.01.5 = c(278.7125854, 278.7125854, 281.2553711),
> X2026.01.01.6 = c(278.5204773, 278.5204773, 280.6148071)), .Names
> c("FIPS",
> "X2026.01.01.1", "X2026.01.01.2",
"X2026.01.01.3", "X2026.01.01.4",
> "X2026.01.01.5", "X2026.01.01.6"), class =
"data.frame", row.names > c(NA,
> -3L))
The data is in 3-hour blocks for every day by US FIPS code from 2026-2045,
each year's data is in a difference csv. My goal is to to compute max, min,
and mean by week and month. I used the following code to assign week
numbers to the observations;
nweek <- function(x, format="%Y-%m-%d",
origin){> if(missing(origin)){
> as.integer(format(strptime(x, format=format), "%W"))
> }else{
> x <- as.Date(x, format=format)
> o <- as.Date(origin, format=format)
> w <- as.integer(format(strptime(x, format=format),
"%w"))
> 2 + as.integer(x - o - w) %/% 7
> }
> }
>
Then the following;
for (i in filelist) {> nweek(tmp2$date)
> }
> for (i in filelist) {
> nweek(dates, origin="2026-01-01")
> }
> for (i in filelist) {
> wkn<-nweek(tmp2$date)
> }
Is this efficient? Thank you so much again. I really appreciate it.
Sincerely,
Shouro
On Sun, Feb 1, 2015 at 1:22 AM, jim holtman <jholtman at gmail.com> wrote:
> It would have been nice if you had at least supplied a subset (~10 lines)
> from a couple of files so we could see what the data looks like and test
> out any solution. Since you are using 'data.table', you should
probably
> also use 'fread' for reading in the data. Here is a possible
approach of
> reading the data into a list and then creating a single, large data.table:
>
> -------
> myDTs <- lapply(filelist, function(.file) {
> tmp1 <- fread(.file, sep=",")
> tmp2 <- melt(tmp1, id="FIPS")
> tmp2$year <- as.numeric(substr(tmp2$variable,2,5))
> tmp2$month <- as.numeric(substr(tmp2$variable,7,8))
> tmp2$day <- as.numeric(substr(tmp2$variable,10,11))
> tmp2 # return value
> })
>
> bigDT <- rbindlist(myDTs) # rbind all the data.tables together
>
> # then you should be able to do:
>
> mean.temp <- bigDT[, list(temp.mean=lapply(.SD, mean),
> by=c("FIPS","year","month"),
.SDcols=c("temp")]
>
>
>
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> On Sat, Jan 31, 2015 at 5:57 PM, Shouro Dasgupta <shouro at
gmail.com> wrote:
>
>> I have climate data for 20 years for US counties (FIPS) in csv format,
>> each
>> file represents one year of data. I have extracted the data and
reshaped
>> the yearly data files using melt();
>>
>> for (i in filelist) {
>> > tmp1 <- as.data.table(read.csv(i,header=T,
sep=","))
>> > tmp2 <- melt(tmp1, id="FIPS")
>> > tmp2$year <- as.numeric(substr(tmp2$variable,2,5))
>> > tmp2$month <- as.numeric(substr(tmp2$variable,7,8))
>> > tmp2$day <- as.numeric(substr(tmp2$variable,10,11))
>> > }
>>
>>
>> Should I *rbind *in the loop here as I have the memory?
>> So, the file (i) tmp2 looks like this:
>>
>> FIPS temp year month date
>> > 1001 276.7936 2045 1 1/1/2045
>> > 1003 276.7936 2045 1 1/1/2045
>> > 1005 279.6452 2045 1 1/1/2045
>> > 1007 276.7936 2045 1 1/1/2045
>> > 1009 272.3748 2045 1 1/1/2045
>> > 1011 279.6452 2045 1 1/1/2045
>>
>>
>> My goal is calculate the mean by FIPS code by month/week, however, when
I
>> use the following code, I get a NULL value.
>>
>> mean.temp<- for (i in filelist) {tmp2[, list(temp.mean=lapply(.SD,
mean),
>> > by=c("FIPS","year","month"),
.SDcols=c("temp")]}
>>
>>
>> This works fine for individual years but with *for (i in filelist)*.
What
>> am I doing wrong? Can include a rbind/bindlist in the loop to make a
big
>> data.frame? Any suggestions will be highly appreciated. Thank you.
>>
>> Sincerely,
>>
>> Shouro
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
[[alternative HTML version deleted]]