Hi There, While looking through the mailing list archive, I did not come across a simple minded example regarding the creation of dummy variables. The Gauss language provides the command "y = dummydn(x,v,p)" for creating dummy variables. Here: x = Nx1 vector of data to be broken up into dummy variables. v = Kx1 vector specifying the K-1 breakpoints p = positive integer in the range [1,K], specifying which column should be dropped in the matrix of dummy variables. y = Nx(K-1) matrix containing the K-1 dummy variables. My recent mailing list archive inquiry has led me to examine R's "model.matrix" but it has so many options that I'm not seeing the forest because of the trees. Is that really the easiest way? or is there something similar to the dummydn command described above? To provide a concrete scenario, please consider the following. Using the above notation, say, I had: x <- c(1:10) #data to be broken up into dummy variables v <- c(3,5,7) #breakpoints p = 1 #drop this column to avoid dummy variable trap How can I get a matrix "y" that has the associated dummy variables for columns? Thank You, -Francisco
Dear Francisco, At 08:31 AM 9/5/2003 -0500, Francisco J. Bido wrote:>Hi There, > >While looking through the mailing list archive, I did not come across a >simple minded example regarding the creation of dummy variables. The >Gauss language provides the command "y = dummydn(x,v,p)" for creating >dummy variables. >Here: > >x = Nx1 vector of data to be broken up into dummy variables. >v = Kx1 vector specifying the K-1 breakpoints >p = positive integer in the range [1,K], specifying which column should be >dropped in the matrix of dummy variables. >y = Nx(K-1) matrix containing the K-1 dummy variables. > >My recent mailing list archive inquiry has led me to examine R's >"model.matrix" but it has so many options that I'm not seeing the forest >because of the trees. Is that really the easiest way? or is there >something similar to the dummydn command described above? > >To provide a concrete scenario, please consider the following. Using the >above notation, say, I had: > >x <- c(1:10) #data to be broken up into dummy variables >v <- c(3,5,7) #breakpoints >p = 1 #drop this column to avoid dummy variable trap > >How can I get a matrix "y" that has the associated dummy variables for >columns? >Thank You, >-FranciscoMy initial question would be why do you want to do this? Statistical-model formulas in R implicitly generate dummy variables (and other contrasts) directly from factors, so if this is the context that you had in mind, there's no need to generate the dummy variables explicitly. If you really do want the matrix of dummy regressors, say for a factor named "factor," then you can use model.matrix() to get them. Because the default contrast type for unordered factors is "contr.treatment", which corresponds to 0/1 dummy regressors, you can get the dummy variables as model.matrix(~factor)[,-1]. Here I've removed the initial column of ones returned by model matrix. Alternatively, model.matrix(~ factor - 1) gives you a complete set of dummy regressors; you could then drop whichever column you wanted to. More generally, if you haven't already done so you might see how linear-model formulas are implemented in R. All of the introductions to R cover this topic. I think that this is one of the strengths of the S language, by the way. I hope that this helps, John ----------------------------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario, Canada L8S 4M4 email: jfox at mcmaster.ca phone: 905-525-9140x23604 web: www.socsci.mcmaster.ca/jfox
"Francisco J. Bido" <bido at mac.com> writes:> Hi There, > > While looking through the mailing list archive, I did not come across > a simple minded example regarding the creation of dummy variables. > The Gauss language provides the command "y = dummydn(x,v,p)" for > creating dummy variables. > > Here: > > x = Nx1 vector of data to be broken up into dummy variables. > v = Kx1 vector specifying the K-1 breakpoints > p = positive integer in the range [1,K], specifying which column > should be dropped in the matrix of dummy variables. > > y = Nx(K-1) matrix containing the K-1 dummy variables. > > My recent mailing list archive inquiry has led me to examine R's > "model.matrix" but it has so many options that I'm not seeing the > forest because of the trees. Is that really the easiest way? or is > there something similar to the dummydn command described above? > > > To provide a concrete scenario, please consider the following. Using > the above notation, say, I had: > > > x <- c(1:10) #data to be broken up into dummy variables > v <- c(3,5,7) #breakpoints > p = 1 #drop this column to avoid dummy variable trap > > How can I get a matrix "y" that has the associated dummy variables for > columns?Don't. Consider why you want the dummy variables. You probably want to use them in the specification of a statistical model and R's model specification language automatically expands a factor variable into a set of contrasts. Try data(PlantGrowth) fm = lm(weight ~ group, data = PlantGrowth) summary(fm) and you will see that the `group' factor has been expanded to two of the three indicator variables (if you use the default setting for contrasts - other possibilities exist). You can check explicitly how the model matrix is created with model.matrix(fm) The model specification facilities in R are much more flexible than most other languages and you almost never need to create indicators explicitly.
On 5 Sep 2003 at 8:31, Francisco J. Bido wrote: Yes, model matrix is the answer, and if it has many arguments, it also has many reasonable defaults. When I am trying out a new function, I just accept the dafaults for a starter.> > x <- c(1:10) #data to be broken up into dummy variables > v <- c(3,5,7) #breakpoints > p = 1 #drop this column to avoid dummy variable trap >What about f <- cut(x, breaks=c(0,3,5,7,10) y <- model.matrix( ~ f) (model matrix will drop the first column for you), and make a column for the intercept) If you want all the columns, and no intercept, replace with y <- model.matrix( ~ y - 1) or even y <- model.matrix( ~y + 0) Kjetil Halvorsen> How can I get a matrix "y" that has the associated dummy variables for > columns? > Thank You, > -Francisco > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
In response to a question from Francisco J. Bido, about how to create dummy variables, Doug Bates and others essentially said ``Don't.'' Which is good advice, but .... Recently I encountered a problem involving a linear model with a three level factor (levels low, medium, and high) crossed with linear and quadratic terms in a continuous variate. The client wanted (for some reason --- perhaps I should have discouraged him more forcefully) to compare the full model with a model in which there were linear and quadratic terms for the high level of the factor, but only linear terms for the low and medium levels. The only way I could see of specifying the reduced model was through using dummy variables explicitly. I.e. I could see no way of specifying such a model in the standard general linear model syntax. (The client was actually working in SAS, but the same considerations apply whether one is speaking SAS or R/Splus, it seems to me.) Did I miss something obvious (or even not-so-obvious)? cheers, Rolf Turner rolf at math.unb.ca