Peter Sebastian Masny
2004-Jun-08 17:23 UTC
[R] Comparing two pairs of non-normal datasets in R?
Hi all, I'm using R to analyze some research and I'm not sure which test would be appropriate for my data. I was hoping someone here might be able to help. Short version: Evaluate null hypothesis that change A1->A2 is similar to change C1->C2, for continuous, non-normal datasets. Long version: I have two populations A and C. I take a measurement on samples of these populations before and after a process. So basically I have: A1 - sample of A before process A2 - sample of A after process C1 - sample of C (control) before process C2 - sample of C (control) after process The data is continuous and I have about 100 measurements in each dataset. Also, the data is not normally distributed (more like a Poisson). By Wilcoxon Rank Sum, A1 is significantly different than A2 and C1 is different than C2. Here is the problem: C1 is only slightly different than C2 (Wilcoxon, p<.02), while A1 is more noticeably different than A2 (p<1E-22). What I would like to do is assume that the changes seen in C are typical, and evaluate the changes in A relative to the changes in C (i.e. are the changes greater?). Any thoughts? Thanks, Peter Masny
Have you considered "qqplot(A1, A2)" and "qqplot(C1, C2)"?
If A2,
A2, C1, C2 are "more like Poisson", I might try "qqplot(sqrt(A1),
sqrt(A2))", etc.: Without the "sqrt", the image might be
excessively
distorted by largest values, at least in my experience.
hope this helps. spencer graves
Peter Sebastian Masny wrote:
>Hi all,
>
>I'm using R to analyze some research and I'm not sure which test
would be
>appropriate for my data. I was hoping someone here might be able to help.
>
>Short version:
>Evaluate null hypothesis that change A1->A2 is similar to change
C1->C2, for
>continuous, non-normal datasets.
>
>
>Long version:
>
>I have two populations A and C. I take a measurement on samples of these
>populations before and after a process. So basically I have:
>A1 - sample of A before process
>A2 - sample of A after process
>C1 - sample of C (control) before process
>C2 - sample of C (control) after process
>
>The data is continuous and I have about 100 measurements in each dataset.
>Also, the data is not normally distributed (more like a Poisson).
>
>By Wilcoxon Rank Sum, A1 is significantly different than A2 and C1 is
>different than C2.
>
>Here is the problem:
>C1 is only slightly different than C2 (Wilcoxon, p<.02), while A1 is more
>noticeably different than A2 (p<1E-22). What I would like to do is
assume
>that the changes seen in C are typical, and evaluate the changes in A
>relative to the changes in C (i.e. are the changes greater?).
>
>Any thoughts?
>
>
>
>Thanks,
>Peter Masny
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>
>
Peter Sebastian Masny
2004-Jun-08 19:14 UTC
[R] Comparing two pairs of non-normal datasets in R?
On Tuesday 08 June 2004 10:43 am, you wrote:> If I understand you correctly, you have two set of ***paired*** > data, one set from the A population, and one from the C population. > > Form the pairwise differences: > > A.diff <- A1 - A2 > C.diff <- C1 - C2Alas, they are not paired. A1 and A2 are samples from the same population, but of different members. Also, the number of measurements is different for each dataset.> Boxplots and histograms of A.diff and C.diff will tell you > (much more than a test ever would) what's ***really*** going on.The boxplots I have clearly show the difference, but I need a p value to go with it. Here are the boxplots if that helps: http://www.ps.masny.dk/guests/misc/A1.png http://www.ps.masny.dk/guests/misc/A2.png http://www.ps.masny.dk/guests/misc/C1.png http://www.ps.masny.dk/guests/misc/C2.png> P.S. BTW --- you say that your data are continuous, but that their > distributions are ``more like a Poisson''. The Poisson distribution > is DISCRETE!!!Hence the "like". The data is indeed continuous, but a distribution graph increases towards one extreme... Visually, the results are convincing, but I really need a test of significance. Thank you very much for the help, Peter
> Here are the boxplots if that helps: > http://www.ps.masny.dk/guests/misc/A1.png > http://www.ps.masny.dk/guests/misc/A2.png > http://www.ps.masny.dk/guests/misc/C1.png > http://www.ps.masny.dk/guests/misc/C2.pngHere is how I would do it: It looks like your distributions can be characterized by just a single parameter. Then your question is: Is the change in the parameter value from A1 to A2 larger than that from C1 to C2? (Larger meaning what?: Difference? Ratio?) So I suggest - you estimate your four parameters - compute your two differences or ratios - compute the difference of those - and repeat all this for many resamples (bootstrapping) Then you get a bootstrap distribution of the value that you hope is significantly non-zero.>From that distribution you can read your p-value.(Preferably based on a BCa confidence interval) library(boot) Does that make sense? Lutz