Hello, I'm sorry if my question is really basic, but I'm having some troubles with the statistics for my thesis, and especially the khi square test and contingency tables. For what I understood, there are two "kinds" of khisquare test, that are quite similar : - Homogeneity, when we have one variable and we want to compare it with a theorical distribution - Independence test, when we have 2 variable and we want to see if they are linked -- - I'm working on color transitions, with 3 possible factors : ? High ? , ? Medium ? and ? Low ? I want to know if an individual will go preferably from a color ? High ? to another color ? High ?, more than from a color ? High ? to a color ? Medium ? (for example) I have this table : trans1<-c(51,17,27,12,21,13,37,15,60) transitions1<-matrix(trans1, nrow=3, ncol=3, byrow=T) rownames(transitions1) <- c("High"," Medium", "Low") colnames(transitions1) <- c("High"," Medium", "Low") The first colomn is showing the first color, and the second is showing the second color of the transition It looks like I'm in the case of an Independence test, in order to see if the variable "second color" is linked to the "first color". So I'm making the test : chisq.test(transitions1) (If I understood well, the test on the matrix is the independence test, and the test on the vector trans1 is the homogeneity test ?) The result is significatif, it means that some transitions are prefered. My problem is that I have other transition tables like this one (with other individuals or other conditions) For example, I also have this one : trans2<-c(13,7,8,5,16,18,11,8,17) transitions2<-matrix(trans2, nrow=3, ncol=3, byrow=T) rownames(transitions2) <- c("High","Low", "Stick") colnames(transitions2) <- c("High","Low", "Stick") I want to know if the "prefered" transitions in the table 1 are the same in the table 2. But if I try a khisquare test on those two matrix, R only takes the first one. How can I compare those tables Maybe with another test ? Thanks in advance ! Kind regards Lucie S. [[alternative HTML version deleted]]
> The first colomn is showing the first color, and the second is showing the > second color of the transitionAre you sure? transitions1 is a 3x3 matrix; it has three columns, not two. Could it be that the columns are colour 2 following initial condition given by row, or vice versa? [not that that will help _me_ answer your question, but it may help someone else]. S Ellison ******************************************************************* This email and any attachments are confidential. Any use, copying or disclosure other than by the intended recipient is unauthorised. If you have received this message in error, please notify the sender immediately via +44(0)20 8943 7000 or notify postmaster at lgcgroup.com and delete this message and any copies from your computer and network. LGC Limited. Registered in England 2991879. Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK
You should consult with your adviser or someone at your institution who has more experience in statistical analysis than you do. You want to compare the matrices, but the row/column labels are different so you may be comparing completely different categories. Technically, you need to convert the two matrices into a single matrix. You can do that by converting each into a vector with the c() function. BUT this will compare High with High, Medium with Low, and Low with Stick which seems inadvisable.> rbind(c(transitions1), c(transitions2))[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [1,] 51 12 37 17 21 15 27 13 60 [2,] 13 5 11 7 16 8 8 18 17> chisq.test(rbind(c(transitions1), c(transitions2)))Pearson's Chi-squared test data: rbind(c(transitions1), c(transitions2)) X-squared = 22.411, df = 8, p-value = 0.004208 Warning message: In chisq.test(rbind(c(transitions1), c(transitions2))) : Chi-squared approximation may be incorrect ------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Lucie Dupond Sent: Sunday, June 19, 2016 9:10 PM To: r-help at r-project.org Subject: [R] R help contingency table Hello, I'm sorry if my question is really basic, but I'm having some troubles with the statistics for my thesis, and especially the khi square test and contingency tables. For what I understood, there are two "kinds" of khisquare test, that are quite similar : - Homogeneity, when we have one variable and we want to compare it with a theorical distribution - Independence test, when we have 2 variable and we want to see if they are linked -- - I'm working on color transitions, with 3 possible factors : ? High ? , ? Medium ? and ? Low ? I want to know if an individual will go preferably from a color ? High ? to another color ? High ?, more than from a color ? High ? to a color ? Medium ? (for example) I have this table : trans1<-c(51,17,27,12,21,13,37,15,60) transitions1<-matrix(trans1, nrow=3, ncol=3, byrow=T) rownames(transitions1) <- c("High"," Medium", "Low") colnames(transitions1) <- c("High"," Medium", "Low") The first colomn is showing the first color, and the second is showing the second color of the transition It looks like I'm in the case of an Independence test, in order to see if the variable "second color" is linked to the "first color". So I'm making the test : chisq.test(transitions1) (If I understood well, the test on the matrix is the independence test, and the test on the vector trans1 is the homogeneity test ?) The result is significatif, it means that some transitions are prefered. My problem is that I have other transition tables like this one (with other individuals or other conditions) For example, I also have this one : trans2<-c(13,7,8,5,16,18,11,8,17) transitions2<-matrix(trans2, nrow=3, ncol=3, byrow=T) rownames(transitions2) <- c("High","Low", "Stick") colnames(transitions2) <- c("High","Low", "Stick") I want to know if the "prefered" transitions in the table 1 are the same in the table 2. But if I try a khisquare test on those two matrix, R only takes the first one. How can I compare those tables Maybe with another test ? Thanks in advance ! Kind regards Lucie S. [[alternative HTML version deleted]]
Thank you for your answer ! I'm sorry, i've made a mistake in the second matrix, they should have the same row/column labels, I just used another label vector by mistake. My supervisor doesn't have a solution for this, and neither have every one I asked around me. Thanks for your solution, but I'm afraid that I will loose the interaction between the variable "first color" and "second color" if I convert the matrix into a vector. Thank you for your help ________________________________ De : David L Carlson <dcarlson at tamu.edu> Envoy? : lundi 20 juin 2016 21:06 ? : Lucie Dupond; r-help at r-project.org Objet : RE: R help contingency table You should consult with your adviser or someone at your institution who has more experience in statistical analysis than you do. You want to compare the matrices, but the row/column labels are different so you may be comparing completely different categories. Technically, you need to convert the two matrices into a single matrix. You can do that by converting each into a vector with the c() function. BUT this will compare High with High, Medium with Low, and Low with Stick which seems inadvisable.> rbind(c(transitions1), c(transitions2))[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [1,] 51 12 37 17 21 15 27 13 60 [2,] 13 5 11 7 16 8 8 18 17> chisq.test(rbind(c(transitions1), c(transitions2)))Pearson's Chi-squared test data: rbind(c(transitions1), c(transitions2)) X-squared = 22.411, df = 8, p-value = 0.004208 Warning message: In chisq.test(rbind(c(transitions1), c(transitions2))) : Chi-squared approximation may be incorrect ------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Lucie Dupond Sent: Sunday, June 19, 2016 9:10 PM To: r-help at r-project.org Subject: [R] R help contingency table Hello, I'm sorry if my question is really basic, but I'm having some troubles with the statistics for my thesis, and especially the khi square test and contingency tables. For what I understood, there are two "kinds" of khisquare test, that are quite similar : - Homogeneity, when we have one variable and we want to compare it with a theorical distribution - Independence test, when we have 2 variable and we want to see if they are linked -- - I'm working on color transitions, with 3 possible factors : ? High ? , ? Medium ? and ? Low ? I want to know if an individual will go preferably from a color ? High ? to another color ? High ?, more than from a color ? High ? to a color ? Medium ? (for example) I have this table : trans1<-c(51,17,27,12,21,13,37,15,60) transitions1<-matrix(trans1, nrow=3, ncol=3, byrow=T) rownames(transitions1) <- c("High"," Medium", "Low") colnames(transitions1) <- c("High"," Medium", "Low") The first colomn is showing the first color, and the second is showing the second color of the transition It looks like I'm in the case of an Independence test, in order to see if the variable "second color" is linked to the "first color". So I'm making the test : chisq.test(transitions1) (If I understood well, the test on the matrix is the independence test, and the test on the vector trans1 is the homogeneity test ?) The result is significatif, it means that some transitions are prefered. My problem is that I have other transition tables like this one (with other individuals or other conditions) For example, I also have this one : trans2<-c(13,7,8,5,16,18,11,8,17) transitions2<-matrix(trans2, nrow=3, ncol=3, byrow=T) rownames(transitions2) <- c("High","Low", "Stick") colnames(transitions2) <- c("High","Low", "Stick") I want to know if the "prefered" transitions in the table 1 are the same in the table 2. But if I try a khisquare test on those two matrix, R only takes the first one. How can I compare those tables Maybe with another test ? Thanks in advance ! Kind regards Lucie S. [[alternative HTML version deleted]] [[alternative HTML version deleted]]
Hi Lucie, You can visualize this using the sizetree function (plotrix). You supply a data frame of the individual choice sequences. # form a data frame of "random" choices coltrans<-data.frame(choice1=sample(c("High","Medium","Low"),100,TRUE), choice2=sample(c("High","Medium","Low"),100,TRUE)) sizetree(coltrans,main="Random color choice transitions") # test the two way table of transitions for independence chisq.test(table(coltrans)) # now try a data frame of "habitual" choices coltrans2<-data.frame(choice1=rep(c("High","Medium","Low"),c(33,33,34)), choice2=c(sample(c("High","Medium","Low"),33,TRUE,prob=c(0.6,0.2,0.2)), sample(c("High","Medium","Low"),33,TRUE,prob=c(0.2,0.6,0.2)), sample(c("High","Medium","Low"),34,TRUE,prob=c(0.2,0.2,0.6)))) sizetree(coltrans2,main="Habitual color choice transitions") # test the table again chisq.test(table(coltrans2)) This may be what you want. Jim On Mon, Jun 20, 2016 at 12:09 PM, Lucie Dupond <loupiote93 at hotmail.fr> wrote:> Hello, > I'm sorry if my question is really basic, but I'm having some troubles with the statistics for my thesis, and especially the khi square test and contingency tables. > > For what I understood, there are two "kinds" of khisquare test, that are quite similar : > - Homogeneity, when we have one variable and we want to compare it with a theorical distribution > - Independence test, when we have 2 variable and we want to see if they are linked > > -- - > > I'm working on color transitions, with 3 possible factors : ? High ? , ? Medium ? and ? Low ? > I want to know if an individual will go preferably from a color ? High ? to another color ? High ?, more than from a color ? High ? to a color ? Medium ? (for example) > > I have this table : > > trans1<-c(51,17,27,12,21,13,37,15,60) > transitions1<-matrix(trans1, nrow=3, ncol=3, byrow=T) > rownames(transitions1) <- c("High"," Medium", "Low") > colnames(transitions1) <- c("High"," Medium", "Low") > > The first colomn is showing the first color, and the second is showing the second color of the transition > > It looks like I'm in the case of an Independence test, in order to see if the variable "second color" is linked to the "first color". > > So I'm making the test : > > chisq.test(transitions1) > > > (If I understood well, the test on the matrix is the independence test, and the test on the vector trans1 is the homogeneity test ?) > > The result is significatif, it means that some transitions are prefered. > > My problem is that I have other transition tables like this one (with other individuals or other conditions) > For example, I also have this one : > > > trans2<-c(13,7,8,5,16,18,11,8,17) > transitions2<-matrix(trans2, nrow=3, ncol=3, byrow=T) > rownames(transitions2) <- c("High","Low", "Stick") > colnames(transitions2) <- c("High","Low", "Stick") > > I want to know if the "prefered" transitions in the table 1 are the same in the table 2. > But if I try a khisquare test on those two matrix, R only takes the first one. > > How can I compare those tables > Maybe with another test ? > > Thanks in advance ! > > Kind regards > > Lucie S. > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.