Sacha Viquerat
2012-Aug-04 14:12 UTC
[R] Questionnaire Analysis virtually without continuous Variables
Hello! I am doing an analysis on a questionnaire of hunters taken in 4 different districts of some mysterious foreign country. The aim of the study was to gather info on the factors that determine the hunting success of a peculiarly beautiful bird in that area. All variables are factors, i.e. they are variables such as "Use of Guns - yes / no", "Use of Dogs - yes / no" and the likes. The response is upposed to be "number of Birds caught", which was designed to be the only continuous variable. However, in reality the number of caught birds is between 0 and 1, sometimes hunters answered with 2. Unfortunately, it is not the questioner who is burdened with the analysis, but me. I am struggling to find an appropriate approach to the analysis. I don't really consider this as count data, since it would be very vulnerable to overinflation (and a steep decline for counts above 0). I can't really suggest binomial models either, since the lack of explanatory, continuous data renders such an approach quite vague. I also struggle with the random design of the survey (households nested within villages nested within districts). Adding to that, hunters don't even target the bird as their prime objective. The bird is essentially a by-catch, most often used for instant consumption on the hunting trip. I therefore doubt that any analysis makes more than a little sense, but I will not yet succumb to failure. Any ideas? Thanks in advance! PS: I just realized that this is not a question related to R but to statistics in general. Apologies for that!
R. Michael Weylandt
2012-Aug-04 15:51 UTC
[R] Questionnaire Analysis virtually without continuous Variables
On Sat, Aug 4, 2012 at 9:12 AM, Sacha Viquerat <dawa.ya.moto at googlemail.com> wrote:> Hello! > I am doing an analysis on a questionnaire of hunters taken in 4 different > districts of some mysterious foreign country. The aim of the study was to > gather info on the factors that determine the hunting success of a > peculiarly beautiful bird in that area. All variables are factors, i.e. they > are variables such as "Use of Guns - yes / no", "Use of Dogs - yes / no" and > the likes. The response is upposed to be "number of Birds caught", which was > designed to be the only continuous variable. However, in reality the number > of caught birds is between 0 and 1, sometimes hunters answered with 2. > Unfortunately, it is not the questioner who is burdened with the analysis, > but me. I am struggling to find an appropriate approach to the analysis. I > don't really consider this as count data, since it would be very vulnerable > to overinflation (and a steep decline for counts above 0). I can't really > suggest binomial models either, since the lack of explanatory, continuous > data renders such an approach quite vague. I also struggle with the random > design of the survey (households nested within villages nested within > districts). Adding to that, hunters don't even target the bird as their > prime objective. The bird is essentially a by-catch, most often used for > instant consumption on the hunting trip. I therefore doubt that any analysis > makes more than a little sense, but I will not yet succumb to failure. Any > ideas? > > Thanks in advance!Hi Sacha, This sounds a good deal like homework to me ("some mysterious foreign country") and this list has a "no homework" policy so unfortunately, I don't think you'll be able to get much help here. Best of luck with your analysis however! Michael> > PS: I just realized that this is not a question related to R but to > statistics in general. Apologies for that! > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Joshua Wiley
2012-Aug-04 17:57 UTC
[R] Questionnaire Analysis virtually without continuous Variables
Hi Sacha, You're right that this is not an R related question really (would be better somewhere like crossvalidated.com). If basically everyone catches 0/1 birds, then I would consider dichotomizing: Y <- as.integer(caught >= 1) then check cross tabs to make sure there are no zero cells between predictors and outcome: xtabs(~Y + dogs + guns, data=yourdata) then use the glmer() function to model the nested random effects. m <- glmer(Y ~ dog + gun + (1 | household) + (1 | village) + (1 | district), data = yourdata, family=binomial) summary(m) Cheers, Josh On Aug 4, 2012, at 7:12, Sacha Viquerat <dawa.ya.moto at googlemail.com> wrote:> Hello! > I am doing an analysis on a questionnaire of hunters taken in 4 different districts of some mysterious foreign country. The aim of the study was to gather info on the factors that determine the hunting success of a peculiarly beautiful bird in that area. All variables are factors, i.e. they are variables such as "Use of Guns - yes / no", "Use of Dogs - yes / no" and the likes. The response is upposed to be "number of Birds caught", which was designed to be the only continuous variable. However, in reality the number of caught birds is between 0 and 1, sometimes hunters answered with 2. Unfortunately, it is not the questioner who is burdened with the analysis, but me. I am struggling to find an appropriate approach to the analysis. I don't really consider this as count data, since it would be very vulnerable to overinflation (and a steep decline for counts above 0). I can't really suggest binomial models either, since the lack of explanatory, continuous data renders such an approach quite vague. I also struggle with the random design of the survey (households nested within villages nested within districts). Adding to that, hunters don't even target the bird as their prime objective. The bird is essentially a by-catch, most often used for instant consumption on the hunting trip. I therefore doubt that any analysis makes more than a little sense, but I will not yet succumb to failure. Any ideas? > > Thanks in advance! > > PS: I just realized that this is not a question related to R but to statistics in general. Apologies for that! > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Joshua Wiley
2012-Aug-04 22:39 UTC
[R] Questionnaire Analysis virtually without continuous Variables
You may be able to get around zero cells using a an MCMC approach such as with MCMCglmm. On Aug 4, 2012, at 15:30, Sacha Viquerat <dawa.ya.moto at googlemail.com> wrote:> On 08/04/2012 07:57 PM, Joshua Wiley wrote: >> Hi Sacha, >> >> You're right that this is not an R related question really (would be better somewhere like crossvalidated.com). >> >> If basically everyone catches 0/1 birds, then I would consider dichotomizing: >> >> Y <- as.integer(caught >= 1) >> >> then check cross tabs to make sure there are no zero cells between predictors and outcome: >> >> xtabs(~Y + dogs + guns, data=yourdata) >> >> then use the glmer() function to model the nested random effects. >> >> m <- glmer(Y ~ dog + gun + (1 | household) + (1 | village) + (1 | district), data = yourdata, family=binomial) >> >> summary(m) >> >> Cheers, >> >> Josh >> >> On Aug 4, 2012, at 7:12, Sacha Viquerat <dawa.ya.moto at googlemail.com> wrote: >> >>> Hello! >>> I am doing an analysis on a questionnaire of hunters taken in 4 different districts of some mysterious foreign country. The aim of the study was to gather info on the factors that determine the hunting success of a peculiarly beautiful bird in that area. All variables are factors, i.e. they are variables such as "Use of Guns - yes / no", "Use of Dogs - yes / no" and the likes. The response is upposed to be "number of Birds caught", which was designed to be the only continuous variable. However, in reality the number of caught birds is between 0 and 1, sometimes hunters answered with 2. Unfortunately, it is not the questioner who is burdened with the analysis, but me. I am struggling to find an appropriate approach to the analysis. I don't really consider this as count data, since it would be very vulnerable to overinflation (and a steep decline for counts above 0). I can't really suggest binomial models either, since the lack of explanatory, continuous data renders such an approach quite vague. I also struggle with the random design of the survey (households nested within villages nested within districts). Adding to that, hunters don't even target the bird as their prime objective. The bird is essentially a by-catch, most often used for instant consumption on the hunting trip. I therefore doubt that any analysis makes more than a little sense, but I will not yet succumb to failure. Any ideas? >>> >>> Thanks in advance! >>> >>> PS: I just realized that this is not a question related to R but to statistics in general. Apologies for that! >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > I did exactly what you proposed already (since the binomial model seemed obvious to me), however, of course there are zero cells. I was thinking someone more accustomed to doing questionnaire analysis could unveil some mysterious approach common to sociologists but occluded from the naturalists eyes (hardened after years of dealing with exact science ;) > I think I will expand the binomial approach and just try to find fancy graphics that make up for the low value of the actual results (maybe with colours). :D > Thank you for the reply (do they really give such tasks for homework these days? These kids must be awesome statisticians!) >