thr3ads.net - R help - [R] How to run regressions over increasing time series [Jul 2012]

If this information is useful, please help other people find it:
Share via:

Philippe Hensel

2012-Jul-27 14:30 UTC

[R] How to run regressions over increasing time series

Hello,

I would like to run a series of regressions on my data (response variable
over time):

1) regression from T1 to T2
2) regressions from T1 through T3
3) regression from T1 through T4, etc.

I have been struggling to find a way to do this through commands, as
opposed to cutting up the data manually (my dataset has over 6000
rows/observations).

An illustrative dataset can be created thusly:

dat <- structure(list(Years= c(0, 0, 0, 0.36, 0.36, 0.36, 0.67, 0.67,
0.67, 0.74, 0.74, 0.74),
Obs = c(0, 0, 0, 2.3, 1.9, 2.1, 4.5, 4.5, 4.6, 5.3, 5.5, 5.6)),
.Names = c("Years","Obs"), row.names = c(NA, -12L), class =
"data.frame")

I was trying to use a loop to create subsets of the data corresponding to
the sets of time intervals required (e.g. T1 to T2, T1 through T3, etc.),
but I am having trouble generating a new variable to index time (instead of
the decimal values).  I was figuring that indexing time would allow me to
use a loop to generate the required subsets of data.

I can figure out how many time periods I have and assign a sequential
number to them:

Years <- unique(set.data$Yrs)
Yrs_count <- seq(from = 1, to = length(Years), by = 1)

And then I can combine these into a dataframe:

Yrs_combo <- cbind(Years,Yrs_count)

However, how do I combine this data frame with my larger dataset, which has
different numbers of rows?



But this is just an intermediary step in the process.... Some of you might
suggest an entirely different route.



For now, I can manually create this new time index:

dat2 <- structure(list(Years= c(0, 0, 0, 0.36, 0.36, 0.36, 0.67, 0.67,
0.67, 0.74, 0.74, 0.74),
Obs = c(0, 0, 0, 2.3, 1.9, 2.1, 4.5, 4.5, 4.6, 5.3, 5.5, 5.6),
Yrs_count = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4)),
.Names = c("Years","Obs","Yrs_count"), row.names =
c(NA, -12L), class "data.frame")


The next question is how can I index temporary files in a loop that I use
for extracting the needed data?  I thought I might need two loops: one to
identify the length of the time series, the other to accumulate the data
from T1 through the identified end point - maybe something like:

for (i in 1:Yrs_count) {
for (j in 1:i) {
             keyj <- dat2[,3]==j
dat2j <- dat2[keyj,]
# here is where I want to create a temporary file to accumulate the
different dat2j's I create in this inside loop
 }
# here is where I want to save the file for future use in my regressions

}


I hope this example is clear enough.  My apologies if it isn't - and I
thank the R community  for any ideas, tips, or directions to information
that might be helpful.
Best,

-Philippe

-- 

Philippe Hensel, PhD

NOAA National Geodetic Survey

NGS ECO <http://www.ngs.noaa.gov/web/science_edu/ecosystems_climate/>

 N/NGS2 SSMC3 #8859

 1315 East-West Hwy
Silver Spring MD 20910
(301) 713 3198 x 137

	[[alternative HTML version deleted]]

Jean V Adams

2012-Jul-27 20:32 UTC

head link

[R] How to run regressions over increasing time series

Philippe,

In your example, you have four unique values for Yrs (I had to change your 
code a little to get it to run, so I have the modified version with my 
code below), and those values are what you are referring to when you say 
T1, T2, T3, T4, right?  If I follow what you want to do, the code below 
should help.  I opted to just save the regression results rather than 
saving each of the subsetted data sets.  Of course, you can modify the 
code to save whatever you want.  Hope this helps.

# input data
set.data <- structure(list(
        Yrs= c(0, 0, 0, 0.36, 0.36, 0.36, 0.67, 0.67, 0.67, 0.74, 0.74, 
0.74), 
        Obs = c(0, 0, 0, 2.3, 1.9, 2.1, 4.5, 4.5, 4.6, 5.3, 5.5, 5.6)), 
        .Names = c("Yrs","Obs"), row.names = c(NA, -12L),
class =
"data.frame") 
# determine the unique years
Years <- unique(set.data$Yrs) 
# create an empty list with a length one less than the number of unique 
years
regressions <- vector("list", length(Years)-1)
# for time periods T2, T3, T4, fit a regression to T1:Ti and save the 
results
# to the regressions list just created
for(i in 2:length(Years)) {
        dati <- set.data[set.data$Yrs<=Years[i], ]
        regressions[[i-1]] <- lm(Obs ~ Yrs, data=dati)
        }

Jean


Philippe Hensel <philippe.hensel@noaa.gov> wrote on 07/27/2012 09:30:26 
AM:> 
> Hello,
> 
> I would like to run a series of regressions on my data (response 
variable> over time):
> 
> 1) regression from T1 to T2
> 2) regressions from T1 through T3
> 3) regression from T1 through T4, etc.
> 
> I have been struggling to find a way to do this through commands, as
> opposed to cutting up the data manually (my dataset has over 6000
> rows/observations).
> 
> An illustrative dataset can be created thusly:
> 
> dat <- structure(list(Years= c(0, 0, 0, 0.36, 0.36, 0.36, 0.67, 0.67,
> 0.67, 0.74, 0.74, 0.74),
> Obs = c(0, 0, 0, 2.3, 1.9, 2.1, 4.5, 4.5, 4.6, 5.3, 5.5, 5.6)),
> .Names = c("Years","Obs"), row.names = c(NA, -12L),
class =
"data.frame")> 
> I was trying to use a loop to create subsets of the data corresponding 
to> the sets of time intervals required (e.g. T1 to T2, T1 through T3, 
etc.),> but I am having trouble generating a new variable to index time (instead 
of> the decimal values).  I was figuring that indexing time would allow me 
to> use a loop to generate the required subsets of data.
> 
> I can figure out how many time periods I have and assign a sequential
> number to them:
> 
> Years <- unique(set.data$Yrs)
> Yrs_count <- seq(from = 1, to = length(Years), by = 1)
> 
> And then I can combine these into a dataframe:
> 
> Yrs_combo <- cbind(Years,Yrs_count)
> 
> However, how do I combine this data frame with my larger dataset, which 
has> different numbers of rows?
> 
> 
> 
> But this is just an intermediary step in the process.... Some of you 
might> suggest an entirely different route.
> 
> 
> 
> For now, I can manually create this new time index:
> 
> dat2 <- structure(list(Years= c(0, 0, 0, 0.36, 0.36, 0.36, 0.67, 0.67,
> 0.67, 0.74, 0.74, 0.74),
> Obs = c(0, 0, 0, 2.3, 1.9, 2.1, 4.5, 4.5, 4.6, 5.3, 5.5, 5.6),
> Yrs_count = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4)),
> .Names = c("Years","Obs","Yrs_count"),
row.names = c(NA, -12L), class > "data.frame")
> 
> 
> The next question is how can I index temporary files in a loop that I 
use> for extracting the needed data?  I thought I might need two loops: one 
to> identify the length of the time series, the other to accumulate the data
> from T1 through the identified end point - maybe something like:
> 
> for (i in 1:Yrs_count) {
> for (j in 1:i) {
>              keyj <- dat2[,3]==j
> dat2j <- dat2[keyj,]
> # here is where I want to create a temporary file to accumulate the
> different dat2j's I create in this inside loop
>  }
> # here is where I want to save the file for future use in my regressions
> 
> }
> 
> 
> I hope this example is clear enough.  My apologies if it isn't - and I
> thank the R community  for any ideas, tips, or directions to information
> that might be helpful.
> Best,
> 
> -Philippe
> 
> -- 
> 
> Philippe Hensel, PhD
> 
> NOAA National Geodetic Survey
> 
> NGS ECO <http://www.ngs.noaa.gov/web/science_edu/ecosystems_climate/>
> 
>  N/NGS2 SSMC3 #8859
> 
>  1315 East-West Hwy
> Silver Spring MD 20910
> (301) 713 3198 x 137
	[[alternative HTML version deleted]]

Possibly Parallel Threads

Search for more maybe matching threads

R help - Jul 2012 - How to run regressions over increasing time series

[R] How to run regressions over increasing time series

[R] How to run regressions over increasing time series

Possibly Parallel Threads