Hi, I am trying to create a set of dummy variables to use within a multiple linear regression and am unable to find the codes within the manuals. For example i have: Price Weight Clarity IF VVS1 VVS2 500 8 1 0 0 1000 5.2 0 0 1 864 3 0 1 0 340 2.6 0 0 1 90 0.5 1 0 0 450 2.3 0 1 0 Where price is dependent upon weight (single value in each observation) and clarity (split into three levels, IF, VVS1, VVS2). I am having trouble telling the program that clarity is a set of 3 dummy variables and keep getting error messages, what is the correct way? Any helps is greatly appreciated. Matthew
Hi r-help-bounces at r-project.org napsal dne 16.12.2009 15:58:56:> Hi, > I am trying to create a set of dummy variables to use within a multiplelinear> regression and am unable to find the codes within the manuals. > > For example i have: > Price Weight Clarity > IF VVS1 VVS2 > 500 8 1 0 0 > 1000 5.2 0 0 1 > 864 3 0 1 0 > 340 2.6 0 0 1 > 90 0.5 1 0 0 > 450 2.3 0 1 0 > > Where price is dependent upon weight (single value in each observation)and> clarity (split into three levels, IF, VVS1, VVS2). > I am having trouble telling the program that clarity is a set of 3 dummy> variables and keep getting error messages, what is the correct way?Well, try to bribe it. Or ask what please it to break its resistance. Seriously. What is a structure of your data in R. ?str what commands did you use for regression I suppose lm(Price~Weight+IF+VVS1+VVS2, data=your.data) shall not complain if your.data is a data frame. Regards Petr> > Any helps is greatly appreciated. > Matthew > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
On 12/16/2009 03:58 PM, whitaker m. (mw1006) wrote:> Hi, > I am trying to create a set of dummy variables to use within a multiple linear regression and am unable to find the codes within the manuals. > > For example i have: > Price Weight Clarity > IF VVS1 VVS2 > 500 8 1 0 0 > 1000 5.2 0 0 1 > 864 3 0 1 0 > 340 2.6 0 0 1 > 90 0.5 1 0 0 > 450 2.3 0 1 0 > > Where price is dependent upon weight (single value in each observation) and clarity (split into three levels, IF, VVS1, VVS2). > I am having trouble telling the program that clarity is a set of 3 dummy variables and keep getting error messages, what is the correct way? >Without an example of your code, it's a bit difficult. But it might be easier to use one variable "clarity" with three possible values (IF, VVS1, VVS2), defined as a factor. lm(Price ~ Weight + Clarity) should then do the trick (unless you explicitly want to use a different dummy coding than the default) Stephan
On Wed, 16 Dec 2009, whitaker m. (mw1006) wrote:> Hi, > I am trying to create a set of dummy variables to use within a multiple linear regression and am unable to find the codes within the manuals. > > For example i have: > Price Weight Clarity > IF VVS1 VVS2 > 500 8 1 0 0 > 1000 5.2 0 0 1 > 864 3 0 1 0 > 340 2.6 0 0 1 > 90 0.5 1 0 0 > 450 2.3 0 1 0 > > Where price is dependent upon weight (single value in each observation) and clarity (split into three levels, IF, VVS1, VVS2). > I am having trouble telling the program that clarity is a set of 3 dummy variables and keep getting error messages, what is the correct way?You should code the categorical variable "Clarity" as a "factor" so that R knows that this is a categorical variable and can deal with it appropriately in subsequent computations such as summary() or lm(). Thus, I would recommend to store your data as dat <- data.frame( Price = c(500, 1000, 864, 340, 90, 450), Weight = c(8, 5.2, 3, 2.6, 0.5, 2.3), Clarity = c("IF", "VVS1", "VVS2")[c(1, 3, 2, 3, 1, 2)]) which yields, e.g., R> summary(dat) Price Weight Clarity Min. : 90.0 Min. :0.500 IF :2 1st Qu.: 367.5 1st Qu.:2.375 VVS1:2 Median : 475.0 Median :2.800 VVS2:2 Mean : 540.7 Mean :3.600 3rd Qu.: 773.0 3rd Qu.:4.650 Max. :1000.0 Max. :8.000 and then you can also do R> lm(Price ~ Weight + Clarity, data = dat) Call: lm(formula = Price ~ Weight + Clarity, data = dat) Coefficients: (Intercept) Weight ClarityVVS1 ClarityVVS2 -45.05 80.01 490.02 403.00 or if you wish to choose a different coding R> lm(Price ~ 0 + Weight + Clarity, data = dat) Call: lm(formula = Price ~ 0 + Weight + Clarity, data = dat) Coefficients: Weight ClarityIF ClarityVVS1 ClarityVVS2 80.01 -45.05 444.97 357.95 Some further reading of introductory material on linear regression in R would be useful. Also look at ?lm, ?factor, ?model.matrix, ?contrasts etc. hth, Z> Any helps is greatly appreciated. > Matthew > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >
Is your variable Clarity a categorical with 4 levels? Thus, the need for k-1 (3) dummies? Your error may be the result of creating k instead of k-1 dummies, but can't be sure from the example. In R, you don't have to (unless you really want to) explicitly create separate variables. You can use the internal contrast functions. See ?contr.treatment Which is dummy coding by default. You can specify which group is the reference group. Alternatively, if you prefer effects coding, you can see ?contr.sum There are others as well. Tom Fletcher -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of whitaker m. (mw1006) Sent: Wednesday, December 16, 2009 8:59 AM To: r-help at r-project.org Subject: [R] Creating Dummy Variables in R Hi, I am trying to create a set of dummy variables to use within a multiple linear regression and am unable to find the codes within the manuals. For example i have: Price Weight Clarity IF VVS1 VVS2 500 8 1 0 0 1000 5.2 0 0 1 864 3 0 1 0 340 2.6 0 0 1 90 0.5 1 0 0 450 2.3 0 1 0 Where price is dependent upon weight (single value in each observation) and clarity (split into three levels, IF, VVS1, VVS2). I am having trouble telling the program that clarity is a set of 3 dummy variables and keep getting error messages, what is the correct way? Any helps is greatly appreciated. Matthew ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Possibly Parallel Threads
- ggplot2, density barplot and geom_point layer
- sparse.model.matrix Generates Non-Existent Factor Levels if Ord.factor Columns Present
- Display the character variables in a dataset in R
- tableGrob and properties of a cell
- sparse.model.matrix Generates Non-Existent Factor Levels if Ord.factor Columns Present