Folks, I'm most confused in trying to do something that (I thought) out to be mainstream and straightforward R. :-) Could you please help? I am doing an ordinary linear regression. My goal is: After a regression, to make residuals, and make a new variable which is the lagged residuals (lagged by 1). I will use this variable in a 2nd stage regression (for an error-correcting model). This sounds simple and reasonable, and should be right up R's alley, but I am just not able to do this. Can I please show you the steps which I'm trying and failing in? I start with:> m = lm(NNDA ~ NFA + NFA.x.d1 + NFA.x.d2 + IIP.n + CRR, D.f) > e = residuals(m) > print(e)34 35 36 37 38 39 -5073.24843 -4210.27886 -8218.01782 -1489.10583 -4426.11738 -11332.56052 (lines deleted) 64 65 66 67 68 69 8362.93776 7564.14324 2311.41208 7660.00638 -1271.04645 -10917.29418 (lines deleted) 160 161 162 163 164 165 3858.94591 -11783.04370 -21438.33646 1859.49628 -4988.82853 -25172.43241 Here, the residuals only started at the 34th observation owing to missing data in my data frame. This is correct and sensible. The dataset is 167 observations, but 166 and 167 are also missing data and dropped. I tried to use lag(e,1) to make a new vector and failed. I think I am just not understanding the R concept of lag(). In my notion of a lagged vector, I want a vector f where f[35] is e[34], i.e. is the first residual above of -5073.24843. This is just not what I get by saying lag(e,1) - I am just not understanding lag(). I would be very happy if someone could educate me on how to utilise lag(). Okay, I try to get my way in a different way:> print(T)[1] 167> f = numeric(T) > f[1] = NA > f[2:T] = e[1:(T-1)]This looks reasonable? I thought this should do the trick. I am hand-initialising a T-length vector with NA in the 1st elem, and I copy out the values of e[] from 1 till 166 into f[2:T]. I thought this should give me a lagged e. It doesn't --> print(f)[1] NA -5073.24843 -4210.27886 -8218.01782 -1489.10583 (lines deleted) [131] 1859.49628 -4988.82853 -25172.43241 NA NA (lines deleted) [166] NA NA I thought "Okay, what seems to be happening is that the e[1] that I have is `actually' the e[34] of my thoughts". So I try:> f=rep(NA, T) # zap out f > f[35:T] = e[34:(T-1)] # copy out useful stuff into 35..T > print(f)[1] NA NA NA NA NA (lines deleted) [31] NA NA NA NA 7660.00638 [36] -1271.04645 -10917.29418 -11111.60144 -1597.98355 -1066.01901 (lines deleted) [131] 1859.49628 -4988.82853 -25172.43241 NA NA (lines deleted) [166] NA NA This is wrong!! Recall (from upstairs) that e[34] was -5073.24843. That value seems to have mysteriously vanished. Instead, the first non-NA in f - which is f[35] - is 7660.00638, which (incidentally) was e[67]. I just don't know how that value got here. And, the values in f[] seem to peter out at 133! After 133, they are all NA until the end. I guess I'm _just_ not understanding what is the animal that is returned by residual(lm()). I know I am missing something basic, because lots of people must be doing what I am trying: I.e. to run a regression, extract a residual, lag it, and use it for a 2nd stage regression. I know that the vector e (returned by residual(lm())) is different from a simple vector, for when I say:> print(f[35])[1] 7660.006> print(e[35])68 -1271.046 the two animals seem to be different. I don't understand e[35] - why is it not just a number - there seems to be some index tagging along? How do I get at the pure numbers of the residuals? Thanks much, -ans. -- Ajay Shah Consultant ajayshah at mayin.org Department of Economic Affairs http://www.mayin.org/ajayshah Ministry of Finance, New Delhi
Prof Brian Ripley
2004-Mar-09 08:28 UTC
[R] Am failing on making lagged residual after regression
If you have missing data in your data frame and want residuals for all observations, you need to use na.action=na.exclude, not the default na.omit. As for lag, its description says Description: Compute a lagged version of a time series, shifting the time base back by a given number of observations. and you don't have a time series. It works by shifting the time base for a time series, not by moving the contents of a vector. On Mon, 8 Mar 2004, Ajay Shah wrote:> Folks, > > I'm most confused in trying to do something that (I thought) out to be > mainstream and straightforward R. :-) Could you please help? > > I am doing an ordinary linear regression. My goal is: After a > regression, to make residuals, and make a new variable which is the > lagged residuals (lagged by 1). I will use this variable in a 2nd > stage regression (for an error-correcting model). > > This sounds simple and reasonable, and should be right up R's alley, > but I am just not able to do this. Can I please show you the steps > which I'm trying and failing in? > > I start with: > > > m = lm(NNDA ~ NFA + NFA.x.d1 + NFA.x.d2 + IIP.n + CRR, D.f) > > e = residuals(m) > > print(e) > 34 35 36 37 38 39 > -5073.24843 -4210.27886 -8218.01782 -1489.10583 -4426.11738 -11332.56052 > (lines deleted) > 64 65 66 67 68 69 > 8362.93776 7564.14324 2311.41208 7660.00638 -1271.04645 -10917.29418 > (lines deleted) > 160 161 162 163 164 165 > 3858.94591 -11783.04370 -21438.33646 1859.49628 -4988.82853 -25172.43241 > > Here, the residuals only started at the 34th observation owing to > missing data in my data frame. This is correct and sensible. The > dataset is 167 observations, but 166 and 167 are also missing data and > dropped. > > I tried to use lag(e,1) to make a new vector and failed. I think I am > just not understanding the R concept of lag(). In my notion of a > lagged vector, I want a vector f where f[35] is e[34], i.e. is the > first residual above of -5073.24843. This is just not what I get by > saying lag(e,1) - I am just not understanding lag(). I would be very > happy if someone could educate me on how to utilise lag(). > > Okay, I try to get my way in a different way: > > > print(T) > [1] 167 > > f = numeric(T) > > f[1] = NA > > f[2:T] = e[1:(T-1)] > > This looks reasonable? I thought this should do the trick. I am > hand-initialising a T-length vector with NA in the 1st elem, and I > copy out the values of e[] from 1 till 166 into f[2:T]. I thought this > should give me a lagged e. It doesn't -- > > > print(f) > [1] NA -5073.24843 -4210.27886 -8218.01782 -1489.10583 > (lines deleted) > [131] 1859.49628 -4988.82853 -25172.43241 NA NA > (lines deleted) > [166] NA NA > > I thought "Okay, what seems to be happening is that the e[1] that I > have is `actually' the e[34] of my thoughts". So I try: > > > f=rep(NA, T) # zap out f > > f[35:T] = e[34:(T-1)] # copy out useful stuff into 35..T > > print(f) > [1] NA NA NA NA NA > (lines deleted) > [31] NA NA NA NA 7660.00638 > [36] -1271.04645 -10917.29418 -11111.60144 -1597.98355 -1066.01901 > (lines deleted) > [131] 1859.49628 -4988.82853 -25172.43241 NA NA > (lines deleted) > [166] NA NA > > This is wrong!! > > Recall (from upstairs) that e[34] was -5073.24843. That value seems to > have mysteriously vanished. Instead, the first non-NA in f - which is > f[35] - is 7660.00638, which (incidentally) was e[67]. I just don't > know how that value got here. And, the values in f[] seem to peter out > at 133! After 133, they are all NA until the end. > > I guess I'm _just_ not understanding what is the animal that is > returned by residual(lm()). I know I am missing something basic, > because lots of people must be doing what I am trying: I.e. to run a > regression, extract a residual, lag it, and use it for a 2nd stage > regression. > > I know that the vector e (returned by residual(lm())) is different > from a simple vector, for when I say: > > > print(f[35]) > [1] 7660.006 > > print(e[35]) > 68 > -1271.046 > > the two animals seem to be different. I don't understand e[35] - why > is it not just a number - there seems to be some index tagging along? > How do I get at the pure numbers of the residuals? > > Thanks much, > > -ans. > >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Possibly Parallel Threads
- R-beta: is there a way to get rid of loop?
- [LLVMdev] gold and debug information
- Proper way to call variables from a parent class to a child class
- forwarding packets to service in same host without using loopback network
- Have 2 DomU share a same Logical Volume