thr3ads.net - R help - [R] A goodness of fit test for two discrete distributions with unequal variance? [Aug 2019]

If this information is useful, please help other people find it:
Share via:

Serena De Stefani

2019-Aug-23 21:52 UTC

[R] A goodness of fit test for two discrete distributions with unequal variance?

I have a computer simulation in which a virtual agent end up in different
areas of a layout based on several factors. There are 18 conditions in
total.
If I collapse the datapoint into bins, where each bin is one of the areas,
the data would look like this:

    x0 <- c(3,3,5,5,2) # computer simulation

Now I would like to validate this model having human subjects going trough
the same conditions, but I run into two sets of issues:

 1. the first issue is due to the fact that the dataset is discrete and
small (there may be less than 5 counts in a bin, and that's a problem for a
Chi-Square Goodness of Fit test), also there may be ties. After some online
digging I found two options:
- a permutation test
- a Cramer-von Mises test of goodness-of-fit (see this paper
<https://journal.r-project.org/archive/2011/RJ-2011-016/RJ-2011-016.pdf>
 https://journal.r-project.org/archive/2011/RJ-2011-016/RJ-2011-016.pdf)

I thought the Cramer-von Mises test of goodness-of-fit test could work, so
I ran it with made-up data for *one human subject* and I get the following
result:

    x0 <- c(3,3,5,5,2) # computer simulation
    x1 <- c(4,2,5,4,3) # subject 1

    library(goftest)

    cvm.test(x0, ecdf(x1))

    >Cramer-von Mises test of goodness-of-fit>Null hypothesis: distribution ?ecdf(x1)?
    >data:  x0
    >omega2 = 0.14667, p-value = 0.4106

So far so good. But now let?s say I would like to have more than one human
subject, let?s say four of them. These are the results from the additional
subjects:

    x2 <- c(3,3,5,2,5) # subject 2
    x3 <- c(2,2,5,6,3) # subject 3
    x4 <- c(3,2,5,6,2) # subject 4

Now I run in the second set of issues:

2. on the one side I have a single computer simulation, on the other side I
have data from four subjects. Should I take the mean of the results for the
human subjects? Then would my data still be ?discrete?? Or should I run my
simulation four times? But I would get always the same results, so the
variance between the two datasets would be different.

Any ideas? Maybe I should change the design and have more levels for my
factors, so that I have more trials and the bins get bigger?

	[[alternative HTML version deleted]]

David Winsemius

2019-Aug-23 22:03 UTC

head link

[R] A goodness of fit test for two discrete distributions with unequal variance?

On 8/23/19 2:52 PM, Serena De Stefani wrote:> I have a computer simulation in which a virtual agent end up in different
> areas of a layout based on several factors. There are 18 conditions in
> total.
> If I collapse the datapoint into bins, where each bin is one of the areas,
> the data would look like this:
>
>      x0 <- c(3,3,5,5,2) # computer simulation
>
> Now I would like to validate this model having human subjects going trough
> the same conditions, but I run into two sets of issues:
>
>   1. the first issue is due to the fact that the dataset is discrete and
> small (there may be less than 5 counts in a bin, and that's a problem
for a
> Chi-Square Goodness of Fit test), also there may be ties. After some online
> digging I found two options:
> - a permutation test
> - a Cramer-von Mises test of goodness-of-fit (see this paper
>
<https://journal.r-project.org/archive/2011/RJ-2011-016/RJ-2011-016.pdf>
>   https://journal.r-project.org/archive/2011/RJ-2011-016/RJ-2011-016.pdf)
>
> I thought the Cramer-von Mises test of goodness-of-fit test could work, so
> I ran it with made-up data for *one human subject* and I get the following
> result:
>
>      x0 <- c(3,3,5,5,2) # computer simulation
>      x1 <- c(4,2,5,4,3) # subject 1
>
>      library(goftest)
>
>      cvm.test(x0, ecdf(x1))
>
>      >Cramer-von Mises test of goodness-of-fit
>> Null hypothesis: distribution ?ecdf(x1)?
>      >data:  x0
>      >omega2 = 0.14667, p-value = 0.4106
>
> So far so good. But now let?s say I would like to have more than one human
> subject, let?s say four of them. These are the results from the additional
> subjects:
>
>      x2 <- c(3,3,5,2,5) # subject 2
>      x3 <- c(2,2,5,6,3) # subject 3
>      x4 <- c(3,2,5,6,2) # subject 4
>
> Now I run in the second set of issues:
>
> 2. on the one side I have a single computer simulation, on the other side I
> have data from four subjects. Should I take the mean of the results for the
> human subjects? Then would my data still be ?discrete?? Or should I run my
> simulation four times? But I would get always the same results, so the
> variance between the two datasets would be different.
>
> Any ideas? Maybe I should change the design and have more levels for my
> factors, so that I have more trials and the bins get bigger?
>
> 	[[alternative HTML version deleted]]

Statistics questions, especially those from people who have failed to 
heed the advice of the Posting Guide to post in plain text, are 
off-topic on rhelp and should be posted to a forum where statistics 
questions are welcomed. (My suspicion is that this question will be 
greeted with further requests for clarification of goals, since asking 
what you "should" do requires an careful explanation of what your 
standards of evidence are and what you are attempting to demonstrate.


-- 

David.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Aug 2019 - A goodness of fit test for two discrete distributions with unequal variance?

[R] A goodness of fit test for two discrete distributions with unequal variance?

[R] A goodness of fit test for two discrete distributions with unequal variance?