I'm grappling with a problem and would appreciate any thoughts on it. I'm revising a paper for resubmission to a journal. For the paper, I've coded each "turn" in a series of conversations with several binary codes. (A turn is one package of statements made by one speaker, starting with the beginning of the speech and ending when the speaker stops or is interrupted.) The reviewers want me to justify the decision I made to code each turn individually, ignoring (for this analysis) the turns that surround each turn. My thought is to run a logistic regression, predicting the presence/absence of a code in a given turn, with independent variables being the number of turns that have elapsed since each code was last used in the conversation. No problem so far. The problem involves treating what are essentially missing data. If I simply omit cases in which one or more variables is missing, it's a very conservative test, since it includes only turns for which all codes have already occurred once in the conversation. An alternative is to set the number of turns that has elapsed since the last use of code to a suitably high number--probably 1 + the total number of turns elapsed in the conversation--which would let me include all statements (including those that introduce codes into a conversation) but also would inflate the influence of prior use on current use by postulating a nonexistent use "just before" the conversation. I hope this is clear enough to be informative. I'd be interested in any thoughts folks might have. Thanks, Andy Perrin ---------------------------------------------------------------------- Andrew J Perrin - http://www.unc.edu/~aperrin Assistant Professor of Sociology, U of North Carolina, Chapel Hill clists at perrin.socsci.unc.edu * andrew_perrin (at) unc.edu