Mikko Pakkanen
2005-May-25 09:43 UTC
[R] Problem with systemfit 0.7-3 and transformed variables
The 'systemfit' function in systemfit 0.7-3 CRAN package seems to have a problem with formulas that contain transformed (eg. log) variables. If I have my data in a data frame, apparently systemfit doesn't "pass" the information of where the variables should be taken to the transforming function. I'm not entirely sure if this is a bug or just a limitation, I was just surprised when I attempted to estimate a model, which I'd previously estimated with OLS using 'lm', with 2SLS using 'systemfit' and it didn't accept those transformations like 'lm' does. Here's an example: this is, of course, OK:> data(kmenta) > demand <- q ~ p + d > instr <- ~ d + f > fit1 <- systemfit("2SLS", eqns=list(demand), inst=instr, data=kmenta)But, now if I'd like to estimate a model with logarithm of p as a regressor, an error occurs:> demand2 <- q ~ log(p) + d > fit2 <- systemfit("2SLS", eqns=list(demand2), inst=instr, data=kmenta)Error in log(p) : Object "p" not found However, estimating the same formula with OLS using the regular 'lm' is OK:> fit2.ols <- lm(demand2, kmenta)Transforming an instrument causes the same error too:> instr2 <- ~ log(d) + f > fit3 <- systemfit("2SLS", eqns=list(demand), inst=instr2, data=kmenta)Error in log(d) : Object "d" not found One could certainly just create those transformed variables to avoid the problem, but it would be much more convenient, if it wasn't necessary, especially if several regressors are involved.> version_ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 1.0 year 2005 month 04 day 18 language R Regards, -Mikko Pakkanen
Arne Henningsen
2005-May-25 10:50 UTC
[R] Problem with systemfit 0.7-3 and transformed variables
On Wednesday 25 May 2005 11:43, Mikko Pakkanen wrote:> The 'systemfit' function in systemfit 0.7-3 CRAN package seems to have a > problem with formulas that contain transformed (eg. log) variables. If I > have my data in a data frame, apparently systemfit doesn't "pass" the > information of where the variables should be taken to the transforming > function. > > I'm not entirely sure if this is a bug or just a limitation, I was just > surprised when I attempted to estimate a model, which I'd previously > estimated with OLS using 'lm', with 2SLS using 'systemfit' and it didn't > accept those transformations like 'lm' does. > > Here's an example: this is, of course, OK: > > data(kmenta) > > demand <- q ~ p + d > > instr <- ~ d + f > > fit1 <- systemfit("2SLS", eqns=list(demand), inst=instr, data=kmenta) > > But, now if I'd like to estimate a model with logarithm of p as a > regressor, > > an error occurs: > > demand2 <- q ~ log(p) + d > > fit2 <- systemfit("2SLS", eqns=list(demand2), inst=instr, data=kmenta) > > Error in log(p) : Object "p" not found > > However, estimating the same formula with OLS using the regular 'lm' is OK: > > fit2.ols <- lm(demand2, kmenta) > > Transforming an instrument causes the same error too: > > instr2 <- ~ log(d) + f > > fit3 <- systemfit("2SLS", eqns=list(demand), inst=instr2, data=kmenta) > > Error in log(d) : Object "d" not found > > One could certainly just create those transformed variables to avoid the > problem, but it would be much more convenient, if it wasn't necessary, > especially if several regressors are involved.We did not notice this shortcoming of systemfit() so far. Unfortunately, I don't have the time in the next few days to look into the code and figure out how to enable transformed variables. I suggest that you either create transformed variables by hand or you modify the systemfit code to enable this and send us the patch. I prefer the second :-) (that's the philosophy of open-source software like R: useRs become developeRs). Best wishes, Arne> > version > > _ > platform i386-pc-mingw32 > arch i386 > os mingw32 > system i386, mingw32 > status > major 2 > minor 1.0 > year 2005 > month 04 > day 18 > language R > > Regards, > > -Mikko Pakkanen > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html-- Arne Henningsen Department of Agricultural Economics University of Kiel Olshausenstr. 40 D-24098 Kiel (Germany) Tel: +49-431-880 4445 Fax: +49-431-880 1397 ahenningsen at agric-econ.uni-kiel.de http://www.uni-kiel.de/agrarpol/ahenningsen/
Dear Mikko, You might try the tsls function in the sem package (which also does 2SLS). Regards, John -------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox --------------------------------> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Mikko Pakkanen > Sent: Wednesday, May 25, 2005 4:43 AM > To: r-help at stat.math.ethz.ch > Subject: [R] Problem with systemfit 0.7-3 and transformed variables > > The 'systemfit' function in systemfit 0.7-3 CRAN package > seems to have a problem with formulas that contain > transformed (eg. log) variables. If I have my data in a data > frame, apparently systemfit doesn't "pass" the information of > where the variables should be taken to the transforming function. > > I'm not entirely sure if this is a bug or just a limitation, > I was just surprised when I attempted to estimate a model, > which I'd previously estimated with OLS using 'lm', with 2SLS > using 'systemfit' and it didn't accept those transformations > like 'lm' does. > > Here's an example: this is, of course, OK: > > > data(kmenta) > > demand <- q ~ p + d > > instr <- ~ d + f > > fit1 <- systemfit("2SLS", eqns=list(demand), inst=instr, > data=kmenta) > > But, now if I'd like to estimate a model with logarithm of p > as a regressor, an error occurs: > > > demand2 <- q ~ log(p) + d > > fit2 <- systemfit("2SLS", eqns=list(demand2), inst=instr, > data=kmenta) > Error in log(p) : Object "p" not found > > However, estimating the same formula with OLS using the > regular 'lm' is OK: > > fit2.ols <- lm(demand2, kmenta) > > Transforming an instrument causes the same error too: > > > instr2 <- ~ log(d) + f > > fit3 <- systemfit("2SLS", eqns=list(demand), inst=instr2, > data=kmenta) > Error in log(d) : Object "d" not found > > One could certainly just create those transformed variables > to avoid the problem, but it would be much more convenient, > if it wasn't necessary, especially if several regressors are involved. > > > version > _ > platform i386-pc-mingw32 > arch i386 > os mingw32 > system i386, mingw32 > status > major 2 > minor 1.0 > year 2005 > month 04 > day 18 > language R > > Regards, > > -Mikko Pakkanen > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html
Mikko Pakkanen
2005-May-25 16:25 UTC
[R] Problem with systemfit 0.7-3 and transformed variables
> We did not notice this shortcoming of systemfit() so far. Unfortunately, > I > don't have the time in the next few days to look into the code and figure > out > how to enable transformed variables. I suggest that you either create > transformed variables by hand or you modify the systemfit code to enable > this > and send us the patch. I prefer the second :-) (that's the philosophy of > open-source software like R: useRs become developeRs).Luckily, I had some time to check the code. Debugger revealed that the problems are caused by the model.frame function which is used to compile the '$data' data frame. I don't need that data frame so much, so I just substituted model.frame with model.matrix which apparently doesn't cause this error with transformed variables. However, I tuned it a bit further, so that it should still return an identical '$data' data frame, despite the modification. I've only tested this with my example and it appears to be OK. Still, I think this should be considered a "quick & dirty" fix -there are probably better ways to do it. But, I hope it gives the idea. Here's my attempt: mikko at briscoe R $ diff -u systemfit.R systemfit-patched.R --- systemfit.R 2004-11-26 11:17:36.000000000 +0200 +++ systemfit-patched.R 2005-05-25 18:55:55.568944699 +0300 @@ -624,7 +624,11 @@ Terms <- terms( eqns[[i]], data = data) m$formula <- Terms m <- eval(m, parent.frame()) - datai <- model.frame(Terms, m) + resp <- model.extract(m, "response") + ## using model.matrix instead of model.frame, need to get the output variable separately + datai <- data.frame(cbind(resp, (model.matrix(Terms, m))[,-1])) + ## I guess there's a better way to extract the name of the output variable? + names(datai)[1] <- as.character(terms(eqns[[i]]))[2] if(method=="2SLS" | method=="3SLS") { #datai <- cbind( datai, model.frame( instl[[i]] )) # the following lines have to be substituted for the previous @@ -634,7 +638,8 @@ Terms <- terms(instl[[i]], data = data) m$formula <- Terms m <- eval(m, parent.frame()) - datai <- cbind( datai, model.frame(Terms, m)) + ## used previously model.frame + datai <- cbind( datai, as.data.frame((model.matrix(Terms, m))[,-1])) } if(i==1) { Regards, -Mikko.