Hi, I am trying to do a multiple regression on the dataset "Hdma", available in the Ecdat package. The data looks like this:> str(Hdma)'data.frame': 2381 obs. of 13 variables: $ dir : num 0.221 0.265 0.372 0.32 0.36 ... $ hir : num 0.221 0.265 0.248 0.25 0.35 ... $ lvr : num 0.8 0.922 0.92 0.86 0.6 ... $ ccs : num 5 2 1 1 1 1 1 2 2 2 ... $ mcs : num 2 2 2 2 1 1 2 2 2 1 ... $ pbcr : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ dmi : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 2 1 ... $ self : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ single : Factor w/ 2 levels "no","yes": 1 2 1 1 1 1 2 1 1 2 ... $ uria : num 3.9 3.2 3.2 4.3 3.2 ... $ comdominiom: num 0 0 0 0 0 0 1 0 0 0 ... $ black : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ deny : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 2 1 ... I would like to try a more complex regression, but even this relatively uncomplicated one returns an error: summary(lm(deny ~ hir + dir + ccs + mcs + black)) The error I get is: Error in storage.mode(y) <- "double" : invalid to change the storage mode of a factor In addition: Warning message: In model.response(mf, "numeric") : using type="numeric" with a factor response will be ignored I understand that there is something wrong due to the fact that some of the variables are factors. But as far as I've grasped, it should be possible to include factor variables when using lm(). Am I in error in thinking this? Sincerely, Gabriel Bergin Undergraduate economics student [[alternative HTML version deleted]]
The problem is not in the covariates but in the respons variable. lm() can only handle numerical variables. Deny is a factor, hence you get an error. HTH, Thierry ------------------------------------------------------------------------ ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie & Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics & Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey> -----Oorspronkelijk bericht----- > Van: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] Namens Gabriel Bergin > Verzonden: dinsdag 12 oktober 2010 11:39 > Aan: r-help at r-project.org > Onderwerp: [R] Factors in an regression using lm() > > Hi, > > I am trying to do a multiple regression on the dataset > "Hdma", available in the Ecdat package. > > The data looks like this: > > str(Hdma) > 'data.frame': 2381 obs. of 13 variables: > $ dir : num 0.221 0.265 0.372 0.32 0.36 ... > $ hir : num 0.221 0.265 0.248 0.25 0.35 ... > $ lvr : num 0.8 0.922 0.92 0.86 0.6 ... > $ ccs : num 5 2 1 1 1 1 1 2 2 2 ... > $ mcs : num 2 2 2 2 1 1 2 2 2 1 ... > $ pbcr : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... > $ dmi : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 2 1 ... > $ self : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... > $ single : Factor w/ 2 levels "no","yes": 1 2 1 1 1 1 2 1 1 2 ... > $ uria : num 3.9 3.2 3.2 4.3 3.2 ... > $ comdominiom: num 0 0 0 0 0 0 1 0 0 0 ... > $ black : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... > $ deny : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 2 1 ... > > I would like to try a more complex regression, but even this > relatively uncomplicated one returns an error: > > summary(lm(deny ~ hir + dir + ccs + mcs + black)) > > The error I get is: > Error in storage.mode(y) <- "double" : > invalid to change the storage mode of a factor In addition: > Warning message: > In model.response(mf, "numeric") : > using type="numeric" with a factor response will be ignored > > I understand that there is something wrong due to the fact > that some of the variables are factors. But as far as I've > grasped, it should be possible to include factor variables > when using lm(). Am I in error in thinking this? > > Sincerely, > Gabriel Bergin > Undergraduate economics student > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hi, Your response (dependent) variable, which has to be on the left side of the '~' in the formula, should be numeric. In your example deny is a factor; first problem The explaining variables, on the right side of the '~', should be factors. Here, hir, dir, css and mcs are numeric; second problem. Only black is a factor. There are two possibilities (not mutually exclusive): - you should transform your factors into numeric and vice-versa as needed, see ?factor and ?as.numeric, as well as StringAsFactor argument from ?read.table (I guess you imported your data.frame that way) - you should adjust your model formula. It might be that you mixed up the variables in the formula. See ?formula HTH, Ivan Le 10/12/2010 11:39, Gabriel Bergin a ?crit :> Hi, > > I am trying to do a multiple regression on the dataset "Hdma", available in > the Ecdat package. > > The data looks like this: >> str(Hdma) > 'data.frame': 2381 obs. of 13 variables: > $ dir : num 0.221 0.265 0.372 0.32 0.36 ... > $ hir : num 0.221 0.265 0.248 0.25 0.35 ... > $ lvr : num 0.8 0.922 0.92 0.86 0.6 ... > $ ccs : num 5 2 1 1 1 1 1 2 2 2 ... > $ mcs : num 2 2 2 2 1 1 2 2 2 1 ... > $ pbcr : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... > $ dmi : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 2 1 ... > $ self : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... > $ single : Factor w/ 2 levels "no","yes": 1 2 1 1 1 1 2 1 1 2 ... > $ uria : num 3.9 3.2 3.2 4.3 3.2 ... > $ comdominiom: num 0 0 0 0 0 0 1 0 0 0 ... > $ black : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... > $ deny : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 2 1 ... > > I would like to try a more complex regression, but even this relatively > uncomplicated one returns an error: > > summary(lm(deny ~ hir + dir + ccs + mcs + black)) > > The error I get is: > Error in storage.mode(y)<- "double" : > invalid to change the storage mode of a factor > In addition: Warning message: > In model.response(mf, "numeric") : > using type="numeric" with a factor response will be ignored > > I understand that there is something wrong due to the fact that some of the > variables are factors. But as far as I've grasped, it should be possible to > include factor variables when using lm(). Am I in error in thinking this? > > Sincerely, > Gabriel Bergin > Undergraduate economics student > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. S?ugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calandra at uni-hamburg.de ********** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php