I am not sure if this is an R-users question, but since most of you here are statisticians, I decided to give it a shot. I am using the lm() function in R to fit a dependent variable to a set of 3 to 5 independent variables. For this, I used the following commands:>model1<-lm(function=PBW~SO4+NO3+NH4)Coefficients: (Intercept) SO4 NO3 NH4 0.01323 0.01968 0.01856 NA and>model2<-lm(function=PBW~SO4+NO3+NH4+Na+Cl)Coefficients: (Intercept) SO4 NO3 NH4 Na Cl -0.0006987 -0.0119750 -0.0295042 0.0842989 0.1344751 NA In both cases, the last independent variable has a coefficient of NA in the result. I say last variable because, when I change the order of the variables, the coefficient changes (see below). Can anyone point me to the reason R behaves this way? Is there anyway for me to force R to use all the variables? I checked the correlation matrices to makes sure there is no orthogonality between the variables. Thanks Aparna model1<-lm(formula = PBW ~ SO4 + NH4 +NO3)> model1Call: lm(formula = PBW ~ SO4 + NH4 + NO3) Coefficients: (Intercept) SO4 NH4 NO3 0.01323 -0.00430 0.06394 NA> model2<-lm(formula = PBW ~ SO4 + NO3 + Na +Cl +NH4) > model2Call: lm(formula = PBW ~ SO4 + NO3 + Na + Cl + NH4) Coefficients: (Intercept) SO4 NO3 Na Cl NH4 -0.0006987 0.0196371 -0.0050303 0.0685020 0.0427431 NA [[alternative HTML version deleted]]
Is this homework? If so, you need to read the text and/or class notes more carefully. -- Bert Gunter -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Vemuri, Aparna Sent: Monday, April 20, 2009 4:26 PM To: r-help at r-project.org Subject: [R] Fitting linear models I am not sure if this is an R-users question, but since most of you here are statisticians, I decided to give it a shot. I am using the lm() function in R to fit a dependent variable to a set of 3 to 5 independent variables. For this, I used the following commands:>model1<-lm(function=PBW~SO4+NO3+NH4)Coefficients: (Intercept) SO4 NO3 NH4 0.01323 0.01968 0.01856 NA and>model2<-lm(function=PBW~SO4+NO3+NH4+Na+Cl)Coefficients: (Intercept) SO4 NO3 NH4 Na Cl -0.0006987 -0.0119750 -0.0295042 0.0842989 0.1344751 NA In both cases, the last independent variable has a coefficient of NA in the result. I say last variable because, when I change the order of the variables, the coefficient changes (see below). Can anyone point me to the reason R behaves this way? Is there anyway for me to force R to use all the variables? I checked the correlation matrices to makes sure there is no orthogonality between the variables. Thanks Aparna model1<-lm(formula = PBW ~ SO4 + NH4 +NO3)> model1Call: lm(formula = PBW ~ SO4 + NH4 + NO3) Coefficients: (Intercept) SO4 NH4 NO3 0.01323 -0.00430 0.06394 NA> model2<-lm(formula = PBW ~ SO4 + NO3 + Na +Cl +NH4) > model2Call: lm(formula = PBW ~ SO4 + NO3 + Na + Cl + NH4) Coefficients: (Intercept) SO4 NO3 Na Cl NH4 -0.0006987 0.0196371 -0.0050303 0.0685020 0.0427431 NA [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Apr 20, 2009, at 7:26 PM, Vemuri, Aparna wrote:> I am not sure if this is an R-users question, but since most of you > here > are statisticians, I decided to give it a shot.You can omit the unnecessary preambles.> > > I am using the lm() function in R to fit a dependent variable to a set > of 3 to 5 independent variables. For this, I used the following > commands: > >> model1<-lm(function=PBW~SO4+NO3+NH4) > Coefficients: > (Intercept) SO4 NO3 NH4 > 0.01323 0.01968 0.01856 NA > > and > >> model2<-lm(function=PBW~SO4+NO3+NH4+Na+Cl) > > Coefficients: > (Intercept) SO4 NO3 NH4 > Na Cl > -0.0006987 -0.0119750 -0.0295042 0.0842989 0.1344751 > NA > > In both cases, the last independent variable has a coefficient of NA > in > the result. I say last variable because, when I change the order of > the > variables, the coefficient changes (see below). Can anyone point me to > the reason R behaves this way? Is there anyway for me to force R to > use > all the variables? I checked the correlation matrices to makes sure > there is no orthogonality between the variables.You really did not name your dependent variable "function" did you? Please stop that. Just a guess, ... since you have not provided enough information to do otherwise, ... Are all of those variables 1/0 dummy variables? If so and if you want to have an output that satisfies your need for labeling the coefficients as you naively anticipate, then put "0+" at the beginning of the formula or "-1" at the end, so that the intercept will disappear and then all variables will get labeled as you expect. -- David Winsemius, MD Heritage Laboratories West Hartford, CT
Try: model1<-lm(PBW~SO4+NO3+NH4) Does it work? Dimitri On Mon, Apr 20, 2009 at 7:26 PM, Vemuri, Aparna <avemuri at epri.com> wrote:> I am not sure if this is an R-users question, but since most of you here > are statisticians, I decided to give it a shot. > > I am using the lm() function in R to fit a dependent variable to a set > of 3 to 5 independent variables. For this, I used the following > commands: > >>model1<-lm(function=PBW~SO4+NO3+NH4) > Coefficients: > (Intercept) ? ? ? ? ?SO4 ? ? ? ? ?NO3 ? ? ?NH4 > ? ?0.01323 ? ? ?0.01968 ? ? ?0.01856 ? ? ? ? ? NA > > and > >>model2<-lm(function=PBW~SO4+NO3+NH4+Na+Cl) > > Coefficients: > (Intercept) ? ? ? ? ?SO4 ? ? ? ? ? ? ? ? NO3 ? ? ?NH4 > Na ? ? ? Cl > ?-0.0006987 ? -0.0119750 ? -0.0295042 ? ?0.0842989 ? ?0.1344751 > NA > > In both cases, the last independent variable has a coefficient of NA in > the result. I say last variable because, when I change the order of the > variables, the coefficient changes (see below). Can anyone point me to > the reason R behaves this way? ?Is there anyway for me to force R to use > all the variables? I checked the correlation matrices to makes sure > there is no orthogonality between the variables. > > Thanks > Aparna > > model1<-lm(formula = PBW ~ SO4 + NH4 +NO3) >> model1 > > Call: > lm(formula = PBW ~ SO4 + NH4 + NO3) > > Coefficients: > (Intercept) ? ? ? ? ?SO4 ? ? ?NH4 ? ? ? ? ?NO3 > ? ?0.01323 ? ? -0.00430 ? ? ?0.06394 ? ? ? ? ? NA > > > > >> model2<-lm(formula = PBW ~ SO4 + NO3 + Na +Cl ?+NH4) >> model2 > > Call: > lm(formula = PBW ~ SO4 + NO3 + Na + Cl + NH4) > > Coefficients: > (Intercept) ? ? ? ? ?SO4 ? ? ? ? ? ? NO3 ? ? ? ? ? ? ? ? ? ? ? ?Na > Cl ? ? ? ? ? ? ? ? ?NH4 > ?-0.0006987 ? ?0.0196371 ? -0.0050303 ? ?0.0685020 ? ?0.0427431 > NA > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dimitri Liakhovitski MarketTools, Inc. Dimitri.Liakhovitski at markettools.com
David, Thanks for the suggestions. No, I did not label my dependent variable "function". My dependent variable PBW and all the independent variables are continuous variables. It is especially troubling since the order in which I input independent variables determines whether or not it gets a coefficient. Like I already mentioned, I checked the correlation matrix and picked the variables with moderate to high correlation with the independent variable. . So I guess it is not so na?ve to expect a regression coefficient on all of them. Dimitri model1<-lm(PBW~SO4+NO3+NH4), gives me the same result as before. Bert: This is not homework. But I will remember to do my research before posting here. Aparna -----Original Message----- From: David Winsemius [mailto:dwinsemius at comcast.net] Sent: Monday, April 20, 2009 5:35 PM To: Vemuri, Aparna Cc: r-help at r-project.org Subject: Re: [R] Fitting linear models On Apr 20, 2009, at 7:26 PM, Vemuri, Aparna wrote:> I am not sure if this is an R-users question, but since most of you > here > are statisticians, I decided to give it a shot.You can omit the unnecessary preambles.> > > I am using the lm() function in R to fit a dependent variable to a set > of 3 to 5 independent variables. For this, I used the following > commands: > >> model1<-lm(function=PBW~SO4+NO3+NH4) > Coefficients: > (Intercept) SO4 NO3 NH4 > 0.01323 0.01968 0.01856 NA > > and > >> model2<-lm(function=PBW~SO4+NO3+NH4+Na+Cl) > > Coefficients: > (Intercept) SO4 NO3 NH4 > Na Cl > -0.0006987 -0.0119750 -0.0295042 0.0842989 0.1344751 > NA > > In both cases, the last independent variable has a coefficient of NA > in > the result. I say last variable because, when I change the order of > the > variables, the coefficient changes (see below). Can anyone point me to > the reason R behaves this way? Is there anyway for me to force R to > use > all the variables? I checked the correlation matrices to makes sure > there is no orthogonality between the variables.You really did not name your dependent variable "function" did you? Please stop that. Just a guess, ... since you have not provided enough information to do otherwise, ... Are all of those variables 1/0 dummy variables? If so and if you want to have an output that satisfies your need for labeling the coefficients as you naively anticipate, then put "0+" at the beginning of the formula or "-1" at the end, so that the intercept will disappear and then all variables will get labeled as you expect. -- David Winsemius, MD Heritage Laboratories West Hartford, CT
On Apr 21, 2009, at 11:12 AM, Vemuri, Aparna wrote:> David, > Thanks for the suggestions. No, I did not label my dependent > variable "function".That was from my error in reading your call to lm. In my defense I am reasonably sure the proper assignment to arguments is lm(formula= ...) rather than lm(function= ...).> > > My dependent variable PBW and all the independent variables are > continuous variables. It is especially troubling since the order in > which I input independent variables determines whether or not it > gets a coefficient. Like I already mentioned, I checked the > correlation matrix and picked the variables with moderate to high > correlation with the independent variable. . So I guess it is not so > na?ve to expect a regression coefficient on all of them. > > Dimitri > model1<-lm(PBW~SO4+NO3+NH4), gives me the same result as before.Did you get the expected results with; model1<-lm(formula=PBW~SO4+NO3+NH4+0) You could, of course, provide either the data or the results of str() applied to each of the variables and then we could all stop guessing.> > Aparna > >> >> >> I am using the lm() function in R to fit a dependent variable to a >> set >> of 3 to 5 independent variables. For this, I used the following >> commands: >> >>> model1<-lm(function=PBW~SO4+NO3+NH4) >> Coefficients: >> (Intercept) SO4 NO3 NH4 >> 0.01323 0.01968 0.01856 NA >> >> and >> >>> model2<-lm(function=PBW~SO4+NO3+NH4+Na+Cl) >> >> Coefficients: >> (Intercept) SO4 NO3 NH4 >> Na Cl >> -0.0006987 -0.0119750 -0.0295042 0.0842989 0.1344751 >> NA >> >> In both cases, the last independent variable has a coefficient of NA >> in >> the result. I say last variable because, when I change the order of >> the >> variables, the coefficient changes (see below). Can anyone point me >> to >> the reason R behaves this way? Is there anyway for me to force R to >> use >> all the variables? I checked the correlation matrices to makes sure >> there is no orthogonality between the variables. > > You really did not name your dependent variable "function" did you? > Please stop that. > > Just a guess, ... since you have not provided enough information to do > otherwise, ... Are all of those variables 1/0 dummy variables? If so > and if you want to have an output that satisfies your need for > labeling the coefficients as you naively anticipate, then put "0+" at > the beginning of the formula or "-1" at the end, so that the intercept > will disappear and then all variables will get labeled as you expect.-- David Winsemius, MD Heritage Laboratories West Hartford, CT
These are all field measured values. For a little background here, I have field measurements of SO4, NO3 and NH4. I used these variables in an atmospheric chemistry model to calculate PBW on a line-by-line basis. To bypass the use of the complex atmospheric chemistry model in the future, I want to develop a regression equation based on the current results I have. Also, I know the atmospheric chemistry model requires SO4, NO3 and NH4 to estimate PBW. So I am using the same as IVs for the regression model. Aparna -----Original Message----- From: Dimitri Liakhovitski [mailto:ld7631 at gmail.com] Sent: Tuesday, April 21, 2009 9:31 AM To: Vemuri, Aparna Subject: Re: [R] Fitting linear models Aparna, why are your IVs so highly intercorrelated? It's not a good sign... On Tue, Apr 21, 2009 at 12:29 PM, Dimitri Liakhovitski <ld7631 at gmail.com> wrote:> But if the multicollinearity is so strong, then I am wondering why it > worked in the data frame as opposed to 4 seprate vectors? It should > not make any difference... > Dimitri > > On Tue, Apr 21, 2009 at 12:21 PM, Vemuri, Aparna <avemuri at epri.com> wrote: >> Thanks Dimitri! Following exactly what you did, I wrote all my individual variable vectors to a data frame and used lm(formula,data) and this time it works for me too. >> >> Marc, your theory is correct.NH4 variable shares a strong correlation with one of the IV along with the DV. >> ? ? ? ?SO4 ? ? NO3 ? ? NH4 ? ? PBW >> SO4 ? ? 1 ? ? ? ? ? -0.0867 ? ? 0.999 ? 0.999 >> NO3 ? ? -0.0867 ? 1 ? ? -0.0527 -0.0938 >> NH4 ? ? 0.999 ? -0.0527 ? 1 ? ? 0.999 >> PBW ? ? 0.999 ? -0.0938 ?0.999 ?1 >> >> >> Aparna >> >> -----Original Message----- >> From: Dimitri Liakhovitski [mailto:ld7631 at gmail.com] >> Sent: Tuesday, April 21, 2009 9:02 AM >> To: Vemuri, Aparna >> Cc: r-help at r-project.org; David Winsemius >> Subject: Re: [R] Fitting linear models >> >> I am not sure what the problem is. >> I found no errors: >> >> data<-read.csv(file.choose()) ?# I had to change your file extension >> to .csv first >> dim(data) >> names(data) >> >> lapply(data,function(x){sum(is.na(x))}) >> lm.model.1<-lm(PBW~SO4+NO3+NH4,data) >> lm.model.2<-lm(PBW~SO4+NH4+NO3,data) >> print(lm.model.1) ?# Getting nice results >> print(lm.model.2) # Getting same results >> >> # Another method (gets exactly the same results): >> library(Design) >> ols.model.1<-ols(PBW~SO4+NO3+NH4,data) >> ols.model.2<-ols(PBW~SO4+NH4+NO3,data) >> >> Dimitri >> On Tue, Apr 21, 2009 at 11:50 AM, Vemuri, Aparna <avemuri at epri.com> wrote: >>> Attached are the first hundred rows of my data in comma separated format. >>> Forcing the regression line through the origin, still does not give a coefficient on the last independent variable. Also, I don't mind if there is a coefficient on the dependent axis. I just want all of the variables to have coefficients in the regression equation or a at least a consistent result, irrespective of the order of input information. >>> >>> -----Original Message----- >>> From: David Winsemius [mailto:dwinsemius at comcast.net] >>> Sent: Tuesday, April 21, 2009 8:38 AM >>> To: Vemuri, Aparna >>> Cc: r-help at r-project.org >>> Subject: Re: [R] Fitting linear models >>> >>> >>> On Apr 21, 2009, at 11:12 AM, Vemuri, Aparna wrote: >>> >>>> David, >>>> Thanks for the suggestions. No, I did not label my dependent >>>> variable "function". >>> >>> That was from my error in reading your call to lm. In my defense I am >>> reasonably sure the proper assignment to arguments is lm(formula= ...) >>> rather than lm(function= ...). >>>> >>>> >>>> My dependent variable PBW and all the independent variables are >>>> continuous variables. It is especially troubling since the order in >>>> which I input independent variables determines whether or not it >>>> gets a coefficient. ?Like I already mentioned, I checked the >>>> correlation matrix and picked the variables with moderate to high >>>> correlation with the independent variable. . So I guess it is not so >>>> na?ve to expect a regression coefficient on all of them. >>>> >>>> Dimitri >>>> model1<-lm(PBW~SO4+NO3+NH4), gives me the same result as before. >>> >>> Did you get the expected results with; >>> model1<-lm(formula=PBW~SO4+NO3+NH4+0) >>> >>> You could, of course, provide either the data or the results of str() >>> applied to each of the variables and then we could all stop guessing. >>> >>>> >>>> Aparna >>>> >>>>> >>>>> >>>>> I am using the lm() function in R to fit a dependent variable to a >>>>> set >>>>> of 3 to 5 independent variables. For this, I used the following >>>>> commands: >>>>> >>>>>> model1<-lm(function=PBW~SO4+NO3+NH4) >>>>> Coefficients: >>>>> (Intercept) ? ? ? ? ?SO4 ? ? ? ? ?NO3 ? ? ?NH4 >>>>> ? 0.01323 ? ? ?0.01968 ? ? ?0.01856 ? ? ? ? ? NA >>>>> >>>>> and >>>>> >>>>>> model2<-lm(function=PBW~SO4+NO3+NH4+Na+Cl) >>>>> >>>>> Coefficients: >>>>> (Intercept) ? ? ? ? ? ? ?SO4 ? ? ? ? ? ? ? ? ?NO3 ? ? ?NH4 >>>>> Na ? ? ? Cl >>>>> -0.0006987 ? -0.0119750 ? -0.0295042 ? ?0.0842989 ? ?0.1344751 >>>>> NA >>>>> >>>>> In both cases, the last independent variable has a coefficient of NA >>>>> in >>>>> the result. I say last variable because, when I change the order of >>>>> the >>>>> variables, the coefficient changes (see below). Can anyone point me >>>>> to >>>>> the reason R behaves this way? ?Is there anyway for me to force R to >>>>> use >>>>> all the variables? I checked the correlation matrices to makes sure >>>>> there is no orthogonality between the variables. >>>> >>>> You really did not name your dependent variable "function" did you? >>>> Please stop that. >>>> >>>> Just a guess, ... since you have not provided enough information to do >>>> otherwise, ... Are all of those variables 1/0 dummy variables? If so >>>> and if you want to have an output that satisfies your need for >>>> labeling the coefficients as you naively anticipate, then put "0+" at >>>> the beginning of the formula or "-1" at the end, so that the intercept >>>> will disappear and then all variables will get labeled as you expect. >>> -- >>> David Winsemius, MD >>> Heritage Laboratories >>> West Hartford, CT >>> >>> >> >> >> >> -- >> Dimitri Liakhovitski >> MarketTools, Inc. >> Dimitri.Liakhovitski at markettools.com >> > > > > -- > Dimitri Liakhovitski > MarketTools, Inc. > Dimitri.Liakhovitski at markettools.com >-- Dimitri Liakhovitski MarketTools, Inc. Dimitri.Liakhovitski at markettools.com
On Apr 21, 2009, at 12:37 PM, Vemuri, Aparna wrote:> These are all field measured values. > For a little background here, I have field measurements of SO4, NO3 > and NH4. I used these variables in an atmospheric chemistry model to > calculate PBW on a line-by-line basis. > > To bypass the use of the complex atmospheric chemistry model in the > future, I want to develop a regression equation based on the current > results I have. Also, I know the atmospheric chemistry model > requires SO4, NO3 and NH4 to estimate PBW. So I am using the same as > IVs for the regression model. > AparnaOne way to create collinearity is to construct a new variable, say PBW?, as a linear combination of the measurements. If you then re- analyze that augmented dataset, you will naturally get the sort of complaints or unexpected behavior from the R interpreter that you are seeing. -- David Winsemius