Peter Sebastian Masny
2004-Jun-08 17:23 UTC
[R] Comparing two pairs of non-normal datasets in R?
Hi all, I'm using R to analyze some research and I'm not sure which test would be appropriate for my data. I was hoping someone here might be able to help. Short version: Evaluate null hypothesis that change A1->A2 is similar to change C1->C2, for continuous, non-normal datasets. Long version: I have two populations A and C. I take a measurement on samples of these populations before and after a process. So basically I have: A1 - sample of A before process A2 - sample of A after process C1 - sample of C (control) before process C2 - sample of C (control) after process The data is continuous and I have about 100 measurements in each dataset. Also, the data is not normally distributed (more like a Poisson). By Wilcoxon Rank Sum, A1 is significantly different than A2 and C1 is different than C2. Here is the problem: C1 is only slightly different than C2 (Wilcoxon, p<.02), while A1 is more noticeably different than A2 (p<1E-22). What I would like to do is assume that the changes seen in C are typical, and evaluate the changes in A relative to the changes in C (i.e. are the changes greater?). Any thoughts? Thanks, Peter Masny
Have you considered "qqplot(A1, A2)" and "qqplot(C1, C2)"? If A2, A2, C1, C2 are "more like Poisson", I might try "qqplot(sqrt(A1), sqrt(A2))", etc.: Without the "sqrt", the image might be excessively distorted by largest values, at least in my experience. hope this helps. spencer graves Peter Sebastian Masny wrote:>Hi all, > >I'm using R to analyze some research and I'm not sure which test would be >appropriate for my data. I was hoping someone here might be able to help. > >Short version: >Evaluate null hypothesis that change A1->A2 is similar to change C1->C2, for >continuous, non-normal datasets. > > >Long version: > >I have two populations A and C. I take a measurement on samples of these >populations before and after a process. So basically I have: >A1 - sample of A before process >A2 - sample of A after process >C1 - sample of C (control) before process >C2 - sample of C (control) after process > >The data is continuous and I have about 100 measurements in each dataset. >Also, the data is not normally distributed (more like a Poisson). > >By Wilcoxon Rank Sum, A1 is significantly different than A2 and C1 is >different than C2. > >Here is the problem: >C1 is only slightly different than C2 (Wilcoxon, p<.02), while A1 is more >noticeably different than A2 (p<1E-22). What I would like to do is assume >that the changes seen in C are typical, and evaluate the changes in A >relative to the changes in C (i.e. are the changes greater?). > >Any thoughts? > > > >Thanks, >Peter Masny > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://www.stat.math.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > >
Peter Sebastian Masny
2004-Jun-08 19:14 UTC
[R] Comparing two pairs of non-normal datasets in R?
On Tuesday 08 June 2004 10:43 am, you wrote:> If I understand you correctly, you have two set of ***paired*** > data, one set from the A population, and one from the C population. > > Form the pairwise differences: > > A.diff <- A1 - A2 > C.diff <- C1 - C2Alas, they are not paired. A1 and A2 are samples from the same population, but of different members. Also, the number of measurements is different for each dataset.> Boxplots and histograms of A.diff and C.diff will tell you > (much more than a test ever would) what's ***really*** going on.The boxplots I have clearly show the difference, but I need a p value to go with it. Here are the boxplots if that helps: http://www.ps.masny.dk/guests/misc/A1.png http://www.ps.masny.dk/guests/misc/A2.png http://www.ps.masny.dk/guests/misc/C1.png http://www.ps.masny.dk/guests/misc/C2.png> P.S. BTW --- you say that your data are continuous, but that their > distributions are ``more like a Poisson''. The Poisson distribution > is DISCRETE!!!Hence the "like". The data is indeed continuous, but a distribution graph increases towards one extreme... Visually, the results are convincing, but I really need a test of significance. Thank you very much for the help, Peter
> Here are the boxplots if that helps: > http://www.ps.masny.dk/guests/misc/A1.png > http://www.ps.masny.dk/guests/misc/A2.png > http://www.ps.masny.dk/guests/misc/C1.png > http://www.ps.masny.dk/guests/misc/C2.pngHere is how I would do it: It looks like your distributions can be characterized by just a single parameter. Then your question is: Is the change in the parameter value from A1 to A2 larger than that from C1 to C2? (Larger meaning what?: Difference? Ratio?) So I suggest - you estimate your four parameters - compute your two differences or ratios - compute the difference of those - and repeat all this for many resamples (bootstrapping) Then you get a bootstrap distribution of the value that you hope is significantly non-zero.>From that distribution you can read your p-value.(Preferably based on a BCa confidence interval) library(boot) Does that make sense? Lutz