Dear all, I would like to ask one question related to statistics, for specifically on defining dummy variables. As of now, I have come across 3 different kind of dummy variables (assuming I am working with Seasonal dummy, and number of season is 4):> dummy1 <- diag(4) > for(i in 1:3) dummy1 <- rbind(dummy1, diag(4)) > dummy1 <- dummy1[,-4] > > dummy2 <- dummy1 > dummy2[dummy2 == 0] = -1/(4-1) > > dummy3 <- dummy1 - 1/4 > > head(dummy1)[,1] [,2] [,3] [1,] 1 0 0 [2,] 0 1 0 [3,] 0 0 1 [4,] 0 0 0 [5,] 1 0 0 [6,] 0 1 0> head(dummy2)[,1] [,2] [,3] [1,] 1.0000000 -0.3333333 -0.3333333 [2,] -0.3333333 1.0000000 -0.3333333 [3,] -0.3333333 -0.3333333 1.0000000 [4,] -0.3333333 -0.3333333 -0.3333333 [5,] 1.0000000 -0.3333333 -0.3333333 [6,] -0.3333333 1.0000000 -0.3333333> head(dummy3)[,1] [,2] [,3] [1,] 0.75 -0.25 -0.25 [2,] -0.25 0.75 -0.25 [3,] -0.25 -0.25 0.75 [4,] -0.25 -0.25 -0.25 [5,] 0.75 -0.25 -0.25 [6,] -0.25 0.75 -0.25 Now I want to know which type of dummy definition is called Centered dummy and why it is called so? Is it equivalent to use any of the above definitions (atleast 2nd and 3rd?) It would really be very helpful if somebody point any suggestion and clarification. Thanks and regards, [[alternative HTML version deleted]]
R does not use dummy variables. Models and contrasts are specified in more natural, model formula based ways. See ?arima and/or CRAN's Time Series task view for numerous packages that fit time series. -- Bert On Tue, Jan 11, 2011 at 12:18 PM, Christofer Bogaso <bogaso.christofer at gmail.com> wrote:> Dear all, I would like to ask one question related to statistics, for > specifically on defining dummy variables. As of now, I have come across 3 > different kind of dummy variables (assuming I am working with Seasonal > dummy, and number of season is 4): > >> dummy1 <- diag(4) >> for(i in 1:3) dummy1 <- rbind(dummy1, diag(4)) >> dummy1 <- dummy1[,-4] >> >> dummy2 <- dummy1 >> dummy2[dummy2 == 0] = -1/(4-1) >> >> dummy3 <- dummy1 - 1/4 >> >> head(dummy1) > ? ? [,1] [,2] [,3] > [1,] ? ?1 ? ?0 ? ?0 > [2,] ? ?0 ? ?1 ? ?0 > [3,] ? ?0 ? ?0 ? ?1 > [4,] ? ?0 ? ?0 ? ?0 > [5,] ? ?1 ? ?0 ? ?0 > [6,] ? ?0 ? ?1 ? ?0 >> head(dummy2) > ? ? ? ? ? [,1] ? ? ? [,2] ? ? ? [,3] > [1,] ?1.0000000 -0.3333333 -0.3333333 > [2,] -0.3333333 ?1.0000000 -0.3333333 > [3,] -0.3333333 -0.3333333 ?1.0000000 > [4,] -0.3333333 -0.3333333 -0.3333333 > [5,] ?1.0000000 -0.3333333 -0.3333333 > [6,] -0.3333333 ?1.0000000 -0.3333333 >> head(dummy3) > ? ? ?[,1] ?[,2] ?[,3] > [1,] ?0.75 -0.25 -0.25 > [2,] -0.25 ?0.75 -0.25 > [3,] -0.25 -0.25 ?0.75 > [4,] -0.25 -0.25 -0.25 > [5,] ?0.75 -0.25 -0.25 > [6,] -0.25 ?0.75 -0.25 > Now I want to know which type of dummy definition is called Centered dummy > and why it is called so? Is it equivalent to use any of the above > definitions (atleast 2nd and 3rd?) It would really be very helpful if > somebody point any suggestion and clarification. > > Thanks and regards, > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Bert Gunter Genentech Nonclinical Biostatistics 467-7374 http://devo.gene.com/groups/devo/depts/ncb/home.shtml
You are not offering example of real codings but are rather showing something1 that you think looks like something2 (in R) that looks like something3 (in a textbook?). My guess is that the something2 might be contrast matrices or model matrices. If you want a contrast matrix whose columns sum to zero (which is one possible situation that some people might call "centered" then look at the documentation for sum and poly contrasts. If you want to see a situation where model matrices are constructed which compare to the overall mean (another possible interpretation of "centered"), then look at the documentation for model.matrix and run the examples on that page with a -1 in the formula. -- David. On Jan 11, 2011, at 3:18 PM, Christofer Bogaso wrote:> Dear all, I would like to ask one question related to statistics, for > specifically on defining dummy variables. As of now, I have come > across 3 > different kind of dummy variables (assuming I am working with Seasonal > dummy, and number of season is 4): > >> dummy1 <- diag(4) >> for(i in 1:3) dummy1 <- rbind(dummy1, diag(4)) >> dummy1 <- dummy1[,-4] >> >> dummy2 <- dummy1 >> dummy2[dummy2 == 0] = -1/(4-1) >> >> dummy3 <- dummy1 - 1/4 >> >> head(dummy1) > [,1] [,2] [,3] > [1,] 1 0 0 > [2,] 0 1 0 > [3,] 0 0 1 > [4,] 0 0 0 > [5,] 1 0 0 > [6,] 0 1 0 >> head(dummy2) > [,1] [,2] [,3] > [1,] 1.0000000 -0.3333333 -0.3333333 > [2,] -0.3333333 1.0000000 -0.3333333 > [3,] -0.3333333 -0.3333333 1.0000000 > [4,] -0.3333333 -0.3333333 -0.3333333 > [5,] 1.0000000 -0.3333333 -0.3333333 > [6,] -0.3333333 1.0000000 -0.3333333 >> head(dummy3) > [,1] [,2] [,3] > [1,] 0.75 -0.25 -0.25 > [2,] -0.25 0.75 -0.25 > [3,] -0.25 -0.25 0.75 > [4,] -0.25 -0.25 -0.25 > [5,] 0.75 -0.25 -0.25 > [6,] -0.25 0.75 -0.25 > Now I want to know which type of dummy definition is called Centered > dummy > and why it is called so? Is it equivalent to use any of the above > definitions (atleast 2nd and 3rd?) It would really be very helpful if > somebody point any suggestion and clarification. > > Thanks and regards, > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
On Tue, Jan 11, 2011 at 3:18 PM, Christofer Bogaso <bogaso.christofer at gmail.com> wrote:> Dear all, I would like to ask one question related to statistics, for > specifically on defining dummy variables. As of now, I have come across 3 > different kind of dummy variables (assuming I am working with Seasonal > dummy, and number of season is 4): > >> dummy1 <- diag(4) >> for(i in 1:3) dummy1 <- rbind(dummy1, diag(4)) >> dummy1 <- dummy1[,-4] >> >> dummy2 <- dummy1 >> dummy2[dummy2 == 0] = -1/(4-1) >> >> dummy3 <- dummy1 - 1/4 >> >> head(dummy1) > ? ? [,1] [,2] [,3] > [1,] ? ?1 ? ?0 ? ?0 > [2,] ? ?0 ? ?1 ? ?0 > [3,] ? ?0 ? ?0 ? ?1 > [4,] ? ?0 ? ?0 ? ?0 > [5,] ? ?1 ? ?0 ? ?0 > [6,] ? ?0 ? ?1 ? ?0 >> head(dummy2) > ? ? ? ? ? [,1] ? ? ? [,2] ? ? ? [,3] > [1,] ?1.0000000 -0.3333333 -0.3333333 > [2,] -0.3333333 ?1.0000000 -0.3333333 > [3,] -0.3333333 -0.3333333 ?1.0000000 > [4,] -0.3333333 -0.3333333 -0.3333333 > [5,] ?1.0000000 -0.3333333 -0.3333333 > [6,] -0.3333333 ?1.0000000 -0.3333333 >> head(dummy3) > ? ? ?[,1] ?[,2] ?[,3] > [1,] ?0.75 -0.25 -0.25 > [2,] -0.25 ?0.75 -0.25 > [3,] -0.25 -0.25 ?0.75 > [4,] -0.25 -0.25 -0.25 > [5,] ?0.75 -0.25 -0.25 > [6,] -0.25 ?0.75 -0.25 > Now I want to know which type of dummy definition is called Centered dummy > and why it is called so? Is it equivalent to use any of the above > definitions (atleast 2nd and 3rd?) It would really be very helpful if > somebody point any suggestion and clarification. >The contrasts of your dummy1 matrix are contr.SAS contrasts in R. (The default contrasts in R are contr.treatment which are the same as contr.SAS except contr.SAS uses the last level as the base whereas treatment contrasts use the first level as the base.) options(contrasts = c("contr.SAS", "contr.poly")) f <- gl(4, 1, 16) M <- model.matrix( ~ f ) all( M[, -1] == dummy1) # TRUE Centered contrasts are ones which have been centered -- i.e. the mean of each column has been subtracted from that column. This is equivalent to saying that the column sums are zero. The means of the three columns of dummy1 are c(1/4, 1/4, 1/4) so if we subtract 1/4 from dummy1 we get a centered contrasts matrix. That is precisely what you did to get dummy3. We can check that dummy3 is centered: colSums(dummy3) # 0 0 0 dummy2 is just a scaled version of dummy3. In fact dummy2 equals dummy3 / .75 so its not fundamentally different. Its columns still sum to zero so its still centered. all( dummy2 == dummy3 / .75) # TRUE colSums(dummy2) # 0 0 0 except for floating point error -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Christofer, I am not sure I understand how you are using your dummy variables. Generally if you have n categories you need n-1 dummy variables. Thus if you have three categories, low, medium, high and want to compare two of the levels to a reference level (a coding scheme sometimes called reference cell coding) you could use the following coding which medium and high to the reference level, low: level dummy1 dummy2 low 0 0 medium 0 1 high 1 0 You will notice that for three categories, my dummy variables from an 3 by 2 matrix. In general the dummy variable matrix for n categories will be an n by n-1 matrix. You say your have four seasons. I would expect your dummy variable matrix to be of size 4 by 3. Your matrices are 6 by 3. Am I not understanding what you are trying to do? John John Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics Baltimore VA Medical Center GRECC, University of Maryland School of Medicine Claude D. Pepper OAIC, University of Maryland Clinical Nutrition Research Unit, and Baltimore VA Center Stroke of Excellence University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) jsorkin at grecc.umaryland.edu>>> Christofer Bogaso <bogaso.christofer at gmail.com> 1/11/2011 3:18 PM >>>Dear all, I would like to ask one question related to statistics, for specifically on defining dummy variables. As of now, I have come across 3 different kind of dummy variables (assuming I am working with Seasonal dummy, and number of season is 4):> dummy1 <- diag(4) > for(i in 1:3) dummy1 <- rbind(dummy1, diag(4)) > dummy1 <- dummy1[,-4] > > dummy2 <- dummy1 > dummy2[dummy2 == 0] = -1/(4-1) > > dummy3 <- dummy1 - 1/4 > > head(dummy1)[,1] [,2] [,3] [1,] 1 0 0 [2,] 0 1 0 [3,] 0 0 1 [4,] 0 0 0 [5,] 1 0 0 [6,] 0 1 0> head(dummy2)[,1] [,2] [,3] [1,] 1.0000000 -0.3333333 -0.3333333 [2,] -0.3333333 1.0000000 -0.3333333 [3,] -0.3333333 -0.3333333 1.0000000 [4,] -0.3333333 -0.3333333 -0.3333333 [5,] 1.0000000 -0.3333333 -0.3333333 [6,] -0.3333333 1.0000000 -0.3333333> head(dummy3)[,1] [,2] [,3] [1,] 0.75 -0.25 -0.25 [2,] -0.25 0.75 -0.25 [3,] -0.25 -0.25 0.75 [4,] -0.25 -0.25 -0.25 [5,] 0.75 -0.25 -0.25 [6,] -0.25 0.75 -0.25 Now I want to know which type of dummy definition is called Centered dummy and why it is called so? Is it equivalent to use any of the above definitions (atleast 2nd and 3rd?) It would really be very helpful if somebody point any suggestion and clarification. Thanks and regards, [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Confidentiality Statement: This email message, including any attachments, is for th...{{dropped:6}}
Thanks Gabor and other for their input. I admit that I must have placed some reproducible codes on what I wanted. However it was actually in my mind however I restrained because it was not any R related query rather a general Statistics related. Here I am using dummy variables in ***Time series context***. Please assume following artificial TS along with the quarterly dummies: library(zoo) # my time series MyTimeSeries <- zooreg(101:126, start=as.yearqtr(as.Date("2005-01-01")), frequency=4) # creation of quarterly dummy ### dummy1 dummy1 <- zooreg(Reduce("rbind", rep(list(diag(4)), 7)), start=as.yearqtr(as.Date("2005-01-01")), frequency=4) dummy1 <- merge(dummy1, MyTimeSeries, all=F)[,1:4] colnames(dummy1) <- paste("dummy", 1:4, sep="") ### dummy2 dummy2 <- dummy1 - 1/4 ### dummy3 dummy3 <- dummy1 dummy3[dummy3 ==0] = -1/(4-1) # Time series with quarterly dummy TS_with_dummy1 <- cbind(MyTimeSeries, dummy1[,-4]) TS_with_dummy2 <- cbind(MyTimeSeries, dummy2[,-4]) TS_with_dummy3 <- cbind(MyTimeSeries, dummy3[,-4]) TS_with_dummy1 TS_with_dummy2 TS_with_dummy3 Here you see, as my previous post, there are 3 types of dummies: dummy1, dummy2, and dummy3 (quarterly dummies). I used to use dummy1 declaration for all my time series analysis. However later in the "vars" package I noticed the 2nd type of definition for dummy. And 3rd definition I have come across from somewhere in net (which I cant just recall at this time.) Here my question was: which is the centred dummy variable (according to help page of vars package 2nd one is the centred dummy)? However I am searching for the definition of centred dummy variables in time series analysis context. Therefore I would want to know, why 2nd one is called centred dummy? Why people prefer for it, not the Standard dummy definition (i.e. dummy1). Can you please explain? Thanks and regards, -----Original Message----- From: Gabor Grothendieck [mailto:ggrothendieck at gmail.com] Sent: 12 January 2011 05:47 To: Christofer Bogaso Cc: r-help at r-project.org Subject: Re: [R] A question on dummy variable On Tue, Jan 11, 2011 at 3:18 PM, Christofer Bogaso <bogaso.christofer at gmail.com> wrote:> Dear all, I would like to ask one question related to statistics, for > specifically on defining dummy variables. As of now, I have come > across 3 different kind of dummy variables (assuming I am working with > Seasonal dummy, and number of season is 4): > >> dummy1 <- diag(4) >> for(i in 1:3) dummy1 <- rbind(dummy1, diag(4)) >> dummy1 <- dummy1[,-4] >> >> dummy2 <- dummy1 >> dummy2[dummy2 == 0] = -1/(4-1) >> >> dummy3 <- dummy1 - 1/4 >> >> head(dummy1) > ? ? [,1] [,2] [,3] > [1,] ? ?1 ? ?0 ? ?0 > [2,] ? ?0 ? ?1 ? ?0 > [3,] ? ?0 ? ?0 ? ?1 > [4,] ? ?0 ? ?0 ? ?0 > [5,] ? ?1 ? ?0 ? ?0 > [6,] ? ?0 ? ?1 ? ?0 >> head(dummy2) > ? ? ? ? ? [,1] ? ? ? [,2] ? ? ? [,3] > [1,] ?1.0000000 -0.3333333 -0.3333333 > [2,] -0.3333333 ?1.0000000 -0.3333333 > [3,] -0.3333333 -0.3333333 ?1.0000000 > [4,] -0.3333333 -0.3333333 -0.3333333 > [5,] ?1.0000000 -0.3333333 -0.3333333 > [6,] -0.3333333 ?1.0000000 -0.3333333 >> head(dummy3) > ? ? ?[,1] ?[,2] ?[,3] > [1,] ?0.75 -0.25 -0.25 > [2,] -0.25 ?0.75 -0.25 > [3,] -0.25 -0.25 ?0.75 > [4,] -0.25 -0.25 -0.25 > [5,] ?0.75 -0.25 -0.25 > [6,] -0.25 ?0.75 -0.25 > Now I want to know which type of dummy definition is called Centered > dummy and why it is called so? Is it equivalent to use any of the > above definitions (atleast 2nd and 3rd?) It would really be very > helpful if somebody point any suggestion and clarification. >The contrasts of your dummy1 matrix are contr.SAS contrasts in R. (The default contrasts in R are contr.treatment which are the same as contr.SAS except contr.SAS uses the last level as the base whereas treatment contrasts use the first level as the base.) options(contrasts = c("contr.SAS", "contr.poly")) f <- gl(4, 1, 16) M <- model.matrix( ~ f ) all( M[, -1] == dummy1) # TRUE Centered contrasts are ones which have been centered -- i.e. the mean of each column has been subtracted from that column. This is equivalent to saying that the column sums are zero. The means of the three columns of dummy1 are c(1/4, 1/4, 1/4) so if we subtract 1/4 from dummy1 we get a centered contrasts matrix. That is precisely what you did to get dummy3. We can check that dummy3 is centered: colSums(dummy3) # 0 0 0 dummy2 is just a scaled version of dummy3. In fact dummy2 equals dummy3 / .75 so its not fundamentally different. Its columns still sum to zero so its still centered. all( dummy2 == dummy3 / .75) # TRUE colSums(dummy2) # 0 0 0 except for floating point error -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com