Luigi Marongiu
2020-May-10 08:17 UTC
[R] How to determine whether a value belong to a cumulative distribution?
Hello, I am trying to translate a mathematical formula into R. The formula (or rather a set of formulas) is meant to determine the first outlier in a sequence of measurements. To do this, a parameter r is calculated; this is essentially the ratio between the variance of the value x and the sum of the variances of the x-1 elements of the series. x follows a certain distribution (namely, sigmoid), whereas r follows a cumulative empirical one. The text says: "Each r is distributed as t under the model. Therefore, we can test the hypothesis whether a single observation deviates from the model by comparing r with the t distribution, where F(?) is the cumulative distribution function of the t distribution: P-value = 2 * [1 ? F(1 ? |r|)] " I generated a cumulative function with ``` cum_fun = ecdf(abs(x[1:n]) ``` which gives me: ```> n=3 > Empirical CDFCall: ecdf(abs(x{1:n]) x[1:3] = 5.5568, 6.5737, 7.2471 ``` But now how can I determine if x belongs to the distribution? If I do, as in the formula: ```> p = 2 * (1-cum_fun)Error in 1 - cum_fun : non-numeric argument to binary operator ``` Can I get a p-value associated with this association? Thank you -- Best regards, Luigi [[alternative HTML version deleted]]
Ivan Krylov
2020-May-10 12:02 UTC
[R] How to determine whether a value belong to a cumulative distribution?
On Sun, 10 May 2020 10:17:47 +0200 Luigi Marongiu <marongiu.luigi at gmail.com> wrote:>If I do, as in the formula: >``` >> p = 2 * (1-cum_fun) >Error in 1 - cum_fun : non-numeric argument to binary operator >```The ecdf function returns another function that calculates the ECDF value for an arbitrary input. For example, e <- ecdf(1:10) e # Empirical CDF # Call: ecdf(1:10) # x[1:10] = 1, 2, 3, ..., 9, 10 e(c(-1, 5, 100)) # call the returned value as a function # [1] 0.0 0.5 1.0 If you want to see the empirical distribution function values for the points of the dataset itself, call the function returned by ecdf with the same data again: x <- 1:10 ecdf(x)(x) # [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 If you want to calculate the CDF for a given value of 1 ? |r|, pass this value as an argument to the function returned by ecdf: cum_fun <- ecdf(abs(x[1:n]) p <- 2 * (1 - cum_fun(1 - abs(r))) On the other hand, given the quotes from the text, I think than you might need to use the theoretical t distribution function (available as `dt` in R) in the formula instead of ECDF: df <- ... # degrees of freedom for Student t distribution p <- 2 * (1 - dt(1 - abs(r), df)) I am not sure about that, though. -- Best regards, Ivan