thr3ads.net - R help - [R] Kolmogorov-Smirnov test [Apr 2011]

If this information is useful, please help other people find it:
Share via:

m.marcinmichal

2011-Apr-27 21:22 UTC

[R] Kolmogorov-Smirnov test

Hi,
I have a problem with Kolmogorov-Smirnov test fit. I try fit distribution to
my data. Actualy I create two test:
- # First Kolmogorov-Smirnov Tests fit
- # Second Kolmogorov-Smirnov Tests fit
see below. This two test return difrent result and i don't know which is
properly. Which result is properly? The first test return lower D = 0.0234
and lower p-value = 0.00304. The lower 'D' indicate that distribution
function (empirical and teoretical) coincide but low p-value indicate that i
can reject hypotezis H0. For another side this p-value is most higer than
p-value from second test (2.2e-16). Which result, test is most propertly?

matr = rbind(c(1,2))
layout(matr) 

# length vectorSentence = 11999
vectorSentence <- c(....)
vectorLength <- length(vectorSentence)

# assume that we have a table(vectorSentence)
#  1    2    3    4    5    6    7    8    9 
# 512 1878 2400 2572 1875 1206  721  520  315 

# Poisson parameter
param <- fitdistr(vectorSentence, "poisson")

# Expected density
density.exp <- dpois(1:9, lambda=param[[1]][1])

# Expected frequ.
frequ.exp <- dpois(1:9, lambda=param[[1]][1])*vectorLength

# Construct numeric vector of data values (y = vFrequ for Kolmogorov-Smirnov
Tests) 
vFrequ <- c()
for(i in 1:length(frequ.exp)) {
	vFrequ <- append(vFrequ, rep(i, times=frequ.exp[i]))
}

# Check transformation plot(density.exp, ylim=c(0,0.20))
=plot(table(vFrequ)/vectorLength, ylim=c(0,0.20))
plot(table(vectorSentence)/vectorLength)
plot(density.exp, ylim=c(0,0.20))
par(new=TRUE)
plot(table(vFrequ)/vectorLength, ylim=c(0,0.20))

# First Kolmogorov-Smirnov Tests fit
ks.test(vectorSentence, vFrequ)

# Second Kolmogorov-Smirnov Tests fit
ks.test(vectorSentence, "dpois", lambda=param[[1]][1])

# First Kolmogorov-Smirnov Tests fit return data

Two-sample Kolmogorov-Smirnov test

data:  vectorSentence and vFrequ 
D = 0.0234, p-value = 0.00304
alternative hypothesis: two-sided 

Warning message:
In ks.test(vectorSentence, vFrequ) :
  cannot compute correct p-values with ties


# Second Kolmogorov-Smirnov Tests fit return data

One-sample Kolmogorov-Smirnov test

data:  vectorSentence 
D = 0.9832, p-value < 2.2e-16
alternative hypothesis: two-sided 

Warning message:
In ks.test(vectorSentence, "dpois", lambda = param[[1]][1]) :
  cannot compute correct p-values with ties



Best

Marcin M.

--
View this message in context:
http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-tp3479506p3479506.html
Sent from the R help mailing list archive at Nabble.com.

Greg Snow

2011-Apr-28 20:40 UTC

head link

[R] Kolmogorov-Smirnov test

A couple of things to consider:

The Kolmogorov-Smirnov test is designed for distributions on continuous
variable, not discrete like the poisson.  That is why you are getting some of
your warnings.

With a sample size over 10,000 you will have power to detect differences that
are not practically meaningful.  You might as well use
SnowsPenultimateNormalityTest (at least read the help page).

What are you trying to accomplish?  We may be able to give you a better
approach.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of m.marcinmichal
> Sent: Wednesday, April 27, 2011 3:23 PM
> To: r-help at r-project.org
> Subject: [R] Kolmogorov-Smirnov test
> 
> Hi,
> I have a problem with Kolmogorov-Smirnov test fit. I try fit
> distribution to
> my data. Actualy I create two test:
> - # First Kolmogorov-Smirnov Tests fit
> - # Second Kolmogorov-Smirnov Tests fit
> see below. This two test return difrent result and i don't know which
> is
> properly. Which result is properly? The first test return lower D >
0.0234
> and lower p-value = 0.00304. The lower 'D' indicate that
distribution
> function (empirical and teoretical) coincide but low p-value indicate
> that i
> can reject hypotezis H0. For another side this p-value is most higer
> than
> p-value from second test (2.2e-16). Which result, test is most
> propertly?
> 
> matr = rbind(c(1,2))
> layout(matr)
> 
> # length vectorSentence = 11999
> vectorSentence <- c(....)
> vectorLength <- length(vectorSentence)
> 
> # assume that we have a table(vectorSentence)
> #  1    2    3    4    5    6    7    8    9
> # 512 1878 2400 2572 1875 1206  721  520  315
> 
> # Poisson parameter
> param <- fitdistr(vectorSentence, "poisson")
> 
> # Expected density
> density.exp <- dpois(1:9, lambda=param[[1]][1])
> 
> # Expected frequ.
> frequ.exp <- dpois(1:9, lambda=param[[1]][1])*vectorLength
> 
> # Construct numeric vector of data values (y = vFrequ for Kolmogorov-
> Smirnov
> Tests)
> vFrequ <- c()
> for(i in 1:length(frequ.exp)) {
> 	vFrequ <- append(vFrequ, rep(i, times=frequ.exp[i]))
> }
> 
> # Check transformation plot(density.exp, ylim=c(0,0.20)) =>
plot(table(vFrequ)/vectorLength, ylim=c(0,0.20))
> plot(table(vectorSentence)/vectorLength)
> plot(density.exp, ylim=c(0,0.20))
> par(new=TRUE)
> plot(table(vFrequ)/vectorLength, ylim=c(0,0.20))
> 
> # First Kolmogorov-Smirnov Tests fit
> ks.test(vectorSentence, vFrequ)
> 
> # Second Kolmogorov-Smirnov Tests fit
> ks.test(vectorSentence, "dpois", lambda=param[[1]][1])
> 
> # First Kolmogorov-Smirnov Tests fit return data
> 
> Two-sample Kolmogorov-Smirnov test
> 
> data:  vectorSentence and vFrequ
> D = 0.0234, p-value = 0.00304
> alternative hypothesis: two-sided
> 
> Warning message:
> In ks.test(vectorSentence, vFrequ) :
>   cannot compute correct p-values with ties
> 
> 
> # Second Kolmogorov-Smirnov Tests fit return data
> 
> One-sample Kolmogorov-Smirnov test
> 
> data:  vectorSentence
> D = 0.9832, p-value < 2.2e-16
> alternative hypothesis: two-sided
> 
> Warning message:
> In ks.test(vectorSentence, "dpois", lambda = param[[1]][1]) :
>   cannot compute correct p-values with ties
> 
> 
> 
> Best
> 
> Marcin M.
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-
> Smirnov-test-tp3479506p3479506.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

Khanvilkar, Shashank

2011-Sep-26 20:16 UTC

head link

[R] Reading two-cloumn CSV file into a hash table

Sending it again, with correct subject line.


Hello All, 
Thanks in advance for all help,

I am trying to read a two column csv file in R, which looks like:
X,1
Y,2
Z,3

I am using R commands:
tmp = read.csv("test.csv", colClasses=c("character",
"character"))

How can make this into a hash table, so that I can access, tmp["X"]
and it will return me "1"?

Shank

Gabor Grothendieck

2011-Sep-26 20:48 UTC

head link

[R] Reading two-cloumn CSV file into a hash table

On Mon, Sep 26, 2011 at 4:16 PM, Khanvilkar, Shashank
<skhanvil at qualcomm.com> wrote:> Sending it again, with correct subject line.
>
>
> Hello All,
> Thanks in advance for all help,
>
> I am trying to read a two column csv file in R, which looks like:
> X,1
> Y,2
> Z,3
>
> I am using R commands:
> tmp = read.csv("test.csv", colClasses=c("character",
"character"))
>
> How can make this into a hash table, so that I can access,
tmp["X"] and it will return me "1"?
>
Try this:

Lines <- "letters,numbers
X,1
Y,2
Z,3"
cat(Lines, "\n", file = "data.txt")
DF <- read.csv("data.txt")
v <- setNames(DF[,2], DF[,1])

so:
> v[["Y"]][1] 2


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Seemingly Similar Threads

Search for more apparently analagous threads

R help - Apr 2011 - Kolmogorov-Smirnov test

[R] Kolmogorov-Smirnov test

[R] Kolmogorov-Smirnov test

[R] Reading two-cloumn CSV file into a hash table

[R] Reading two-cloumn CSV file into a hash table

Seemingly Similar Threads