Hello, I routinely use aov and and the Error term to perform analyses of variance of experiments with 'within-subject' factors. I wonder whether a notion like 'multistratum models' exists for glm models when performing a logit analysis (without being 100% sure whether this would make sense). I have data of an experiment where the outcome is a categorical variable: 20 individuals listened to 80 synthetic utterances (distributed in 4 types) and were ask classify them into four categories. (The variables in the data.frame are 'subject', 'sentence', 'type', and 'response') Here is the table of counts table(type,response): response type a b c d a 181 166 42 11 b 69 170 72 89 c 90 174 75 61 d 14 125 53 208 There are several questions of interest, such as, for example: - are responses distibuted in the same way for the different types? - are the numbers of 'a' responses for the 'b' and 'c' types significantly different? - is the proportion of 'd' over 'a' responses different for the 'b' and 'c' categories? ... (I want to make inferences for the population of potential subjects on the one hand, and on the population of potential sentences on the other hand). If the responses were continuous, I would just run two one-way anovas: one with the factor type over the means by subject*type, and the other with the factor type over the means by sentences (in type). And use t.test to compare between different pairs of types. Now, as the answers are categorical, I am not sure about the correct approach and how to use R to perform such an analysis. I could treat response as a factor, and use percentages of responses per subject in each cell of response*type, and run an anova on that...[ aov(percentage~response*type+Error(subject/(response*type))] But it seems incorrect to me to use the response of the subject as an independent variable (though I do not have a forceful argument). Simple Chi-square tests are not the answer either, as a given subject contributed several times (80) to the counts in the table above. My reading of MASS and of several other books suggest the use of logit/multinomial models when the response is categorical. But in all the examples provided, the units of analysis contribute only one measurement. Should I include the subject and sentences factors in the formula? But then they would be treated as fixed-factors in the analysis, would they not? Any suggestion is welcome. Christophe Pallier pallier.org
On 18 Apr 2004 at 13:47, Christophe Pallier wrote: You should probably look into glmmPQL (package MASS) or GLMM (package lme4). Kjetil Halvorsen> Hello, > > I routinely use aov and and the Error term to perform analyses of > variance of experiments with 'within-subject' factors. I wonder > whether a notion like 'multistratum models' exists for glm modelswhen> performing a logit analysis (without being 100% sure whether this > would make sense). > > I have data of an experiment where the outcome is a categorical > variable: > > 20 individuals listened to 80 synthetic utterances (distributed in4> types) and were ask classify them into four categories. (Thevariables> in the data.frame are 'subject', 'sentence', 'type', and'response')> > Here is the table of counts table(type,response): > > response > type a b c d > a 181 166 42 11 > b 69 170 72 89 > c 90 174 75 61 > d 14 125 53 208 > > > There are several questions of interest, such as, for example: > > - are responses distibuted in the same way for the different types? > > - are the numbers of 'a' responses for the 'b' and 'c' types > significantly different? > > - is the proportion of 'd' over 'a' responses different for the 'b' > and 'c' categories? > > ... > > (I want to make inferences for the population of potential subjectson> the one hand, and on the population of potential sentences on the > other hand). > > If the responses were continuous, I would just run two one-wayanovas:> one with the factor type over the means by subject*type, and theother> with the factor type over the means by sentences (in type). And use > t.test to compare between different pairs of types. > > Now, as the answers are categorical, I am not sure about thecorrect> approach and how to use R to perform such an analysis. > > I could treat response as a factor, and use percentages ofresponses> per subject in each cell of response*type, and run an anova on > that...[aov(percentage~response*type+Error(subject/(response*type))]> But it seems incorrect to me to use the response of the subject asan> independent variable (though I do not have a forceful argument). > > Simple Chi-square tests are not the answer either, as a givensubject> contributed several times (80) to the counts in the table above. > > My reading of MASS and of several other books suggest the use of > logit/multinomial models when the response is categorical. But inall> the examples provided, the units of analysis contribute only one > measurement. Should I include the subject and sentences factors inthe> formula? But then they would be treated as fixed-factors in the > analysis, would they not? > > > Any suggestion is welcome. > > Christophe Pallier > pallier.org > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > R-project.org/posting-guide.html