thr3ads.net - R help - [R] KS Test question (2) [Aug 2010]

If this information is useful, please help other people find it:
Share via:

Ralf B

2010-Aug-04 21:49 UTC

[R] KS Test question (2)

Hi R Users,

I have two vectors, x and y, of equal length representing two types of
data from two studies. I would like to test if they are similar enough
to use them interchangeably. No assumptions about distributions can be
made (initial tests clearly show that they are not normal).
Here some result:

Two-sample Kolmogorov-Smirnov test

data:  x and y
D = 0.1091, p-value < 2.2e-16
alternative hypothesis: two-sided

Warning message:
In ks.test(x[1:nx], y[1:nx], exact = FALSE) :
  cannot compute correct p-values with ties

Here some questions:

a) What does the error message means and what does it imply?
b) The data is very noisy and the initial result shows that there is
no relation between x and y. Is there a way to calculate and effect
size?
c) Can the p-value be used, when running tests over a large amount of
different data sets, as a metric for ranking similarity between x and
y data sets?

Best
R.

David Winsemius

2010-Aug-04 22:29 UTC

head link

[R] KS Test question (2)

On Aug 4, 2010, at 5:49 PM, Ralf B wrote:
> Hi R Users,
>
> I have two vectors, x and y, of equal length representing two types of
> data from two studies. I would like to test if they are similar enough
> to use them interchangeably. No assumptions about distributions can be
> made (initial tests clearly show that they are not normal).
> Here some result:
>
> Two-sample Kolmogorov-Smirnov test
>
> data:  x and y
> D = 0.1091, p-value < 2.2e-16
> alternative hypothesis: two-sided
>
> Warning message:
> In ks.test(x[1:nx], y[1:nx], exact = FALSE) :
>  cannot compute correct p-values with ties
>
> Here some questions:
>
> a) What does the error message means and what does it imply?
a) It is not an error message.
b) It does seem rather self-explanatory.
> b) The data is very noisy and the initial result
What "initial result"?
> shows that there is
> no relation between x and y. Is there a way to calculate and effect
> size?
An "effect size" implies some sort of statistical model. You have not
offered one yet.

> c) Can the p-value be used, when running tests over a large amount of
> different data sets, as a metric for ranking similarity between x and
> y data sets?
Not in a useful way. The p-value for KS.test large datasets will  
always be small but that information does not characterize the  
differences in distribution in any meaningful way. Many similar  
questions have been posted and answered over the years on
r-help.>
> Best
> R.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT

Glen Barnett

2010-Aug-05 01:23 UTC

head link

[R] KS Test question (2)

It looks like the test is indicating a far bigger difference than
could be explained by random variation.

Since the sample sizes are equal, have you considered plotting the
ordered values of one against the ordered values of the other
(essentially an empirical QQplot), with a 45 degree line drawn in, to
examine the way(s) in which the two samples differ?


On Thu, Aug 5, 2010 at 7:49 AM, Ralf B <ralf.bierig at gmail.com>
wrote:> Hi R Users,
>
> I have two vectors, x and y, of equal length representing two types of
> data from two studies. I would like to test if they are similar enough
> to use them interchangeably. No assumptions about distributions can be
> made (initial tests clearly show that they are not normal).
> Here some result:
>
> Two-sample Kolmogorov-Smirnov test
>
> data: ?x and y
> D = 0.1091, p-value < 2.2e-16
> alternative hypothesis: two-sided
>
> Warning message:
> In ks.test(x[1:nx], y[1:nx], exact = FALSE) :
> ?cannot compute correct p-values with ties
>
> Here some questions:
>
> a) What does the error message means and what does it imply?
> b) The data is very noisy and the initial result shows that there is
> no relation between x and y. Is there a way to calculate and effect
> size?
> c) Can the p-value be used, when running tests over a large amount of
> different data sets, as a metric for ranking similarity between x and
> y data sets?
>
> Best
> R.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Ralf B

2010-Aug-05 08:10 UTC

head link

[R] offlist comment Re: KS Test question (2)

This is unbelievable. Now people like yourself start doing background
searches on one and accusing one of not being professional plus
posting cheeky R code. The reason why I submitted the questions I have
submitted was that these answers did not satisfy my particular problem
(or perhaps I mistakenly thought so). The point here is that the forum
should be a forum where one should be allowed to ask questions without
first studying the history of the the entire forum in fear that
someone might have asked it before. I was hoping that I could find
clearer answers then what I was able to read. I do know how to search
in Google. But I am not an expert in statistics, as you already found
in your background check. If I would be fluent in stastitsics and R
and if past answers would have exactly satisfied my problem I would
not post here and I certainly would not have occupied your expensive
attention.





On Wed, Aug 4, 2010 at 6:16 PM, David Winsemius <dwinsemius at
comcast.net> wrote:>
> On Aug 4, 2010, at 5:49 PM, Ralf B wrote:
>
>> Hi R Users,
>>
>> I have two vectors, x and y, of equal length representing two types of
>> data from two studies. I would like to test if they are similar enough
>> to use them interchangeably. No assumptions about distributions can be
>> made (initial tests clearly show that they are not normal).
>> Here some result:
>>
>> Two-sample Kolmogorov-Smirnov test
>>
>> data: ?x and y
>> D = 0.1091, p-value < 2.2e-16
>> alternative hypothesis: two-sided
>>
>> Warning message:
>> In ks.test(x[1:nx], y[1:nx], exact = FALSE) :
>> ?cannot compute correct p-values with ties
>>
>> Here some questions:
>>
>> a) What does the error message means and what does it imply?
>> b) The data is very noisy and the initial result shows that there is
>> no relation between x and y. Is there a way to calculate and effect
>> size?
>> c) Can the p-value be used, when running tests over a large amount of
>> different data sets, as a metric for ranking similarity between x and
>> y data sets?
>
> There has been quite a bit of discussion on this list over the years about
> why KS test is not good in this situation. If I read the results of a
search
> on your name correctly, you are in a department of Information Sciences. I
> would have thought that the first reaction of someone in that field would
be
> do do a search on a question. Why are you filling up the archives with
> questions that have been repeatedly asked and ?answered?
>
> Do you need help in this area?
>
> rhelpSearch <- function(string,
> ? ? ? ? ? ? ? ? ?restrict = c("Rhelp10", "Rhelp08",
"Rhelp02", "functions"
> ),
> ? ? ? ? ? ? ? ? ?matchesPerPage = 100, ...)
> ? ? ? ? RSiteSearch(string=string, ?restrict = restrict, ?matchesPerPage
> matchesPerPage, ...)
>
>
> rhelpSearch("KS.test ties p-value")
>
>>
>> Best
>> R.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
>

Greg Snow

2010-Aug-05 17:05 UTC

head link

[R] KS Test question (2)

The warning (with an error you would not see any results) means that there are
ties in your data, the theory behind the ks test says that the probability of
seeing ties is 0, so your data and the theory do not match, therefore the
p-value is suspect (though an ok approximation for some uses).

These types of tests are useful for showing differences (often in a non
meaningful way), not similarities.  You really need to decide what you mean by
similar.

Consider two population distributions, the first is the standard uniform with
density height equal to 1 between 0 and 1 (0 elsewhere), the 2nd distribution
has height 1 from 0 to 0.99 and from 99.99 to 100 (0 elsewhere), are these 2
populations similar?  By some measures they are (the ks statistic for one), by
other measures they are not (comparing mean and variance as an example). 
Whether they are similar or not really depends on what you want to do with them.

One additional "test" you might consider is use the vis.test function
in the TeachingDemos package, write a function that will either draw a standard
qqplot of your 2 datasets, or pools them together then splits them randomly and
creates the qqplot.  Use this with vis.test, if you cannot pick out the real
dataset then it is less likely to matter if you interchange them.  (this assumes
2 random samples from the respective populations, if there is something more
going on then you will need to come up with a different comparison that accounts
for any structure).

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Ralf B
> Sent: Wednesday, August 04, 2010 3:50 PM
> To: r-help at r-project.org
> Subject: [R] KS Test question (2)
> 
> Hi R Users,
> 
> I have two vectors, x and y, of equal length representing two types of
> data from two studies. I would like to test if they are similar enough
> to use them interchangeably. No assumptions about distributions can be
> made (initial tests clearly show that they are not normal).
> Here some result:
> 
> Two-sample Kolmogorov-Smirnov test
> 
> data:  x and y
> D = 0.1091, p-value < 2.2e-16
> alternative hypothesis: two-sided
> 
> Warning message:
> In ks.test(x[1:nx], y[1:nx], exact = FALSE) :
>   cannot compute correct p-values with ties
> 
> Here some questions:
> 
> a) What does the error message means and what does it imply?
> b) The data is very noisy and the initial result shows that there is
> no relation between x and y. Is there a way to calculate and effect
> size?
> c) Can the p-value be used, when running tests over a large amount of
> different data sets, as a metric for ranking similarity between x and
> y data sets?
> 
> Best
> R.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

Seemingly Similar Threads

Search for more apparently analagous threads

R help - Aug 2010 - KS Test question (2)

[R] KS Test question (2)

[R] KS Test question (2)

[R] KS Test question (2)

[R] offlist comment Re: KS Test question (2)

[R] KS Test question (2)

Seemingly Similar Threads