Hi, Subject: Using R for Multiple Regression I am new to statistic but am interested in applying mathematical models to solve biological problems. I have used a linear model to generate the test data. When using this data I expect R to correctly identify the model but that does not seem to be the case. I am certain that I am doing something wrong but not able to figure it out. Model: Y = m1x1 + m2x2+ m3X3 + c m1=5 m2=6 m3=0 c=2 Model Identified by R using lm(formula = y ~ x1 + x2 + x3) (Intercept) 8.000e+01 x1 1.100e+01 x2 NA x3 NA The data I am using is as follows: y x1 x2 x3 91 1 14 2 102 2 15 5 113 3 16 8 124 4 17 11 135 5 18 14 146 6 19 17 157 7 20 20 168 8 21 23 179 9 22 26 190 10 23 29 Kind regards Dr. Ambikesh Jayal, Department of Information Systems, Computing and Mathematics Room 134 St John's Building Brunel University Uxbridge, Middlesex UB8 3PH, UK Website: http://sites.google.com/site/ambi1999/ [[alternative HTML version deleted]]
On 30-Jul-10 15:07:46, Ambikesh Jayal wrote:> Hi, > Subject: Using R for Multiple Regression > > I am new to statistic but am interested in applying mathematical > models to solve biological problems. I have used a linear model > to generate the test data. When using this data I expect R to > correctly identify the model but that does not seem to be the case. > I am certain that I am doing something wrong but not able to figure > it out. > > Model: > Y = m1x1 + m2x2+ m3X3 + c >> > Model Identified by R using lm(formula = y ~ x1 + x2 + x3) > (Intercept) 8.000e+01 > x1 1.100e+01 > x2 NA > x3 NA > > > The data I am using is as follows: > > y x1 x2 x3 > 91 1 14 2 > 102 2 15 5 > 113 3 16 8 > 124 4 17 11 > 135 5 18 14 > 146 6 19 17 > 157 7 20 20 > 168 8 21 23 > 179 9 22 26 > 190 10 23 29 > > Kind regards > Dr. Ambikesh Jayal,You should look again at your data! You have x2 = 13 + x1, x3 = 3*x1 - 1 in these data. Hence your model Y = m1*x1 + m2*x2+ m3*X3 + c with m1=5, m2=6, m3=0, c=2 is the same as Y = 5*x1 + 6*(x1+13) + 0*(3*x1 - 1) + 2 = 11*x1 + 6*13 + 2 = 11*x1 + 80 and R has found that the coefficient of x1 is 1.100e+01 = 11, and that the intercept is 8.000e+01 = 80, and has also identified that, after allowing for x1, x2 and x3 are irrelevant. So, to try out how R behaves in linear regression, you should use data which do not have this property that some of the independent variables (x1,x2,x3) are linear functions of the others. However, in trying it out as you have, you have already found out something very important about linear regression! (And about R). Hoping this helps, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 30-Jul-10 Time: 17:59:51 ------------------------------ XFMail ------------------------------
Hi, Your X'X matrix is singular and there is not a unique solution. If you check, the regression equation which R gave you works just as well as yours. This is because your predictor variables are perfectly dependent. This will essentially never happen with real applications do to measurement errors. Jonathan On Fri, Jul 30, 2010 at 9:07 AM, Ambikesh Jayal <ambi1999@gmail.com> wrote:> Hi, > > Subject: Using R for Multiple Regression > > I am new to statistic but am interested in applying mathematical models to > solve biological problems. I have used a linear model to generate the test > data. When using this data I expect R to correctly identify the model but > that does not seem to be the case. I am certain that I am doing something > wrong but not able to figure it out. > > Model: > Y = m1x1 + m2x2+ m3X3 + c > > m1=5 > m2=6 > m3=0 > c=2 > > Model Identified by R using lm(formula = y ~ x1 + x2 + x3) > (Intercept) 8.000e+01 > x1 1.100e+01 > x2 NA > x3 NA > > > The data I am using is as follows: > > y x1 x2 x3 > 91 1 14 2 > 102 2 15 5 > 113 3 16 8 > 124 4 17 11 > 135 5 18 14 > 146 6 19 17 > 157 7 20 20 > 168 8 21 23 > 179 9 22 26 > 190 10 23 29 > > Kind regards > > Dr. Ambikesh Jayal, > Department of Information Systems, Computing and Mathematics > Room 134 St John's Building > Brunel University > Uxbridge, Middlesex > UB8 3PH, UK > Website: http://sites.google.com/site/ambi1999/ > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
---------- Forwarded message ---------- From: Ambikesh Jayal <ambi1999@gmail.com> Date: Sun, Aug 1, 2010 at 2:24 PM Subject: Re: [R] Using R for Multiple Regression To: ted.harding@manchester.ac.uk Hi Ted, Thanks to all those who have replied. It was very helpful. As there can be multiple solutions, is there a way in R to show all the possible models for a dataset? Also in R the value of coefficient of an independent variable being shown as "NA" is same as being shown as "0" (implying that this variable does not count).>However, in trying it out as you have, you have already found out somethingvery important about linear regression! (And about R). The important point being that there can be multiple equations describing a dataset? Or one way to simplify a model is to remove the independent variables that depend on other independent variables? Thanks again. Kind regards Ambikesh Jayal, Department of Information Systems, Computing and Mathematics Room 134 St John's Building Brunel University Uxbridge, Middlesex UB8 3PH, UK Website: http://sites.google.com/site/ambi1999/ On Fri, Jul 30, 2010 at 5:59 PM, Ted Harding <Ted.Harding@manchester.ac.uk>wrote:> On 30-Jul-10 15:07:46, Ambikesh Jayal wrote: > > Hi, > > Subject: Using R for Multiple Regression > > > > I am new to statistic but am interested in applying mathematical > > models to solve biological problems. I have used a linear model > > to generate the test data. When using this data I expect R to > > correctly identify the model but that does not seem to be the case. > > I am certain that I am doing something wrong but not able to figure > > it out. > > > > Model: > > Y = m1x1 + m2x2+ m3X3 + c > > > > > > > Model Identified by R using lm(formula = y ~ x1 + x2 + x3) > > (Intercept) 8.000e+01 > > x1 1.100e+01 > > x2 NA > > x3 NA > > > > > > The data I am using is as follows: > > > > y x1 x2 x3 > > 91 1 14 2 > > 102 2 15 5 > > 113 3 16 8 > > 124 4 17 11 > > 135 5 18 14 > > 146 6 19 17 > > 157 7 20 20 > > 168 8 21 23 > > 179 9 22 26 > > 190 10 23 29 > > > > Kind regards > > Dr. Ambikesh Jayal, > > You should look again at your data! > > You have x2 = 13 + x1, x3 = 3*x1 - 1 in these data. > Hence your model > > Y = m1*x1 + m2*x2+ m3*X3 + c > > with m1=5, m2=6, m3=0, c=2 is the same as > > Y = 5*x1 + 6*(x1+13) + 0*(3*x1 - 1) + 2 > = 11*x1 + 6*13 + 2 > = 11*x1 + 80 > > and R has found that the coefficient of x1 is 1.100e+01 = 11, > and that the intercept is 8.000e+01 = 80, and has also identified > that, after allowing for x1, x2 and x3 are irrelevant. > > So, to try out how R behaves in linear regression, you should > use data which do not have this property that some of the independent > variables (x1,x2,x3) are linear functions of the others. > > However, in trying it out as you have, you have already found out > something very important about linear regression! (And about R). > > Hoping this helps, > Ted. > > -------------------------------------------------------------------- > E-Mail: (Ted Harding) <Ted.Harding@manchester.ac.uk> > Fax-to-email: +44 (0)870 094 0861 > Date: 30-Jul-10 Time: 17:59:51 > ------------------------------ XFMail ------------------------------ >[[alternative HTML version deleted]]