thr3ads.net - R help - [R] quantile from quantile table calculation without original data [Mar 2021]

If this information is useful, please help other people find it:
Share via:

Abby Spurdle

2021-Mar-06 09:02 UTC

[R] quantile from quantile table calculation without original data

I came up with a solution.
But not necessarily the best solution.

I used a spline to approximate the quantile function.
Then use that to generate a large sample.
(I don't see any need for the sample to be random, as such).
Then compute the sample mean and sd, on a log scale.
Finally, plug everything into the plnorm function:

p <- seq (0.01, 0.99,, 1e6)
Fht <- splinefun (temp$percent, temp$size)
x <- log (Fht (p) )
psolution <- plnorm (0.1, mean (x), sd (x), FALSE)
psolution

The value of the solution is very close to one.
Which is not a surprise.

Here's a plot of everything:

u <- seq (0.000001, 1.65,, 200)
v <- plnorm (u, mean (x), sd (x), FALSE)
plot (u, v, type="l", ylim = c (0, 1) )
points (temp$size, temp$percent, pch=16)
points (0.1, psolution, pch=16, col="blue")


On Sat, Mar 6, 2021 at 8:09 PM Abby Spurdle <spurdle.a at gmail.com>
wrote:>
> I'm sorry.
> I misread your example, this morning.
> (I didn't read the code after the line that calls plot).
>
> After looking at this problem again, interpolation doesn't apply, and
> extrapolation would be a last resort.
> If you can assume your data comes from a particular type of
> distribution, such as a lognormal distribution, then a better approach
> would be to find the most likely parameters.
>
> i.e.
> This falls within the broader scope of maximum likelihood.
> (Except that you're dealing with a table of quantile-probability
> pairs, rather than raw observational data).
>
> I suspect that there's a relatively easy way of finding the parameters.
>
> I'll think about it...
> But someone else may come back with an answer first...
>
>
> On Sat, Mar 6, 2021 at 8:17 AM Abby Spurdle <spurdle.a at gmail.com>
wrote:
> >
> > I note three problems with your data:
> > (1) The name "percent" is misleading, perhaps you want
"probability"?
> > (2) There are straight (or near-straight) regions, each of which, is
> > equally (or near-equally) spaced, which is not what I would expect in
> > problems involving "quantiles".
> > (3) Your plot (approximating the distribution function) is
> > back-the-front (as per what is customary).
> >
> >
> > On Fri, Mar 5, 2021 at 10:14 PM PIKAL Petr <petr.pikal at
precheza.cz> wrote:
> > >
> > > Dear all
> > >
> > > I have table of quantiles, probably from lognormal distribution
> > >
> > >  dput(temp)
> > > temp <- structure(list(size = c(1.6, 0.9466, 0.8062, 0.6477,
0.5069,
> > > 0.3781, 0.3047, 0.2681, 0.1907), percent = c(0.01, 0.05, 0.1,
> > > 0.25, 0.5, 0.75, 0.9, 0.95, 0.99)), .Names = c("size",
"percent"
> > > ), row.names = c(NA, -9L), class = "data.frame")
> > >
> > > and I need to calculate quantile for size 0.1
> > >
> > > plot(temp$size, temp$percent, pch=19, xlim=c(0,2))
> > > ss <- approxfun(temp$size, temp$percent)
> > > points((0:100)/50, ss((0:100)/50))
> > > abline(v=.1)
> > >
> > > If I had original data it would be quite easy with ecdf/quantile
function but without it I am lost what function I could use for such task.
> > >
> > > Please, give me some hint where to look.
> > >
> > >
> > > Best regards
> > >
> > > Petr
> > > Osobn? ?daje: Informace o zpracov?n? a ochran? osobn?ch ?daj?
obchodn?ch partner? PRECHEZA a.s. jsou zve?ejn?ny na:
https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about
processing and protection of business partner's personal data are available
on website: https://www.precheza.cz/en/personal-data-protection-principles/
> > > D?v?rnost: Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty
jsou d?v?rn? a podl?haj? tomuto pr?vn? z?vazn?mu prohl??en? o vylou?en?
odpov?dnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents
attached to it may be confidential and are subject to the legally binding
disclaimer: https://www.precheza.cz/en/01-disclaimer/
> > >
> > >
> > >         [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible
code.

David Winsemius

2021-Mar-07 00:32 UTC

head link

[R] quantile from quantile table calculation without original data

On 3/6/21 1:02 AM, Abby Spurdle wrote:> I came up with a solution.
> But not necessarily the best solution.
>
> I used a spline to approximate the quantile function.
> Then use that to generate a large sample.
> (I don't see any need for the sample to be random, as such).
> Then compute the sample mean and sd, on a log scale.
> Finally, plug everything into the plnorm function:
>
> p <- seq (0.01, 0.99,, 1e6)
> Fht <- splinefun (temp$percent, temp$size)
> x <- log (Fht (p) )
> psolution <- plnorm (0.1, mean (x), sd (x), FALSE)
> psolution
>
> The value of the solution is very close to one.
> Which is not a surprise.
>
> Here's a plot of everything:
>
> u <- seq (0.000001, 1.65,, 200)
> v <- plnorm (u, mean (x), sd (x), FALSE)
> plot (u, v, type="l", ylim = c (0, 1) )
> points (temp$size, temp$percent, pch=16)
> points (0.1, psolution, pch=16, col="blue")
Here's another approach, which uses minimization of the squared error to 
get the parameters for a lognormal distribution.

temp <- structure(list(size = c(1.6, 0.9466, 0.8062, 0.6477, 0.5069,
0.3781, 0.3047, 0.2681, 0.1907), percent = c(0.01, 0.05, 0.1,
0.25, 0.5, 0.75, 0.9, 0.95, 0.99)), .Names = c("size",
"percent"
), row.names = c(NA, -9L), class = "data.frame")

obj <- function(x) {sum( qlnorm(1-temp$percent, x[[1]], 
x[[2]])-temp$size )^2}

# Note the inversion of the poorly named and flipped "percent" column,

optim( list(a=-0.65, b=0.42), obj)

#--------------------

$par
 ???????? a????????? b
-0.7020649? 0.4678656

$value
[1] 3.110316e-12

$counts
function gradient
 ????? 51?????? NA

$convergence
[1] 0

$message
NULL


I'm not sure how principled this might be. There's no consideration in 
this approach for expected sampling error at the right tail where the 
magnitudes of the observed values will create much larger contributions 
to the sum of squares.

-- 

David.
>
>
> On Sat, Mar 6, 2021 at 8:09 PM Abby Spurdle <spurdle.a at gmail.com>
wrote:
>> I'm sorry.
>> I misread your example, this morning.
>> (I didn't read the code after the line that calls plot).
>>
>> After looking at this problem again, interpolation doesn't apply,
and
>> extrapolation would be a last resort.
>> If you can assume your data comes from a particular type of
>> distribution, such as a lognormal distribution, then a better approach
>> would be to find the most likely parameters.
>>
>> i.e.
>> This falls within the broader scope of maximum likelihood.
>> (Except that you're dealing with a table of quantile-probability
>> pairs, rather than raw observational data).
>>
>> I suspect that there's a relatively easy way of finding the
parameters.
>>
>> I'll think about it...
>> But someone else may come back with an answer first...
>>
>>
>> On Sat, Mar 6, 2021 at 8:17 AM Abby Spurdle <spurdle.a at
gmail.com> wrote:
>>> I note three problems with your data:
>>> (1) The name "percent" is misleading, perhaps you want
"probability"?
>>> (2) There are straight (or near-straight) regions, each of which,
is
>>> equally (or near-equally) spaced, which is not what I would expect
in
>>> problems involving "quantiles".
>>> (3) Your plot (approximating the distribution function) is
>>> back-the-front (as per what is customary).
>>>
>>>
>>> On Fri, Mar 5, 2021 at 10:14 PM PIKAL Petr <petr.pikal at
precheza.cz> wrote:
>>>> Dear all
>>>>
>>>> I have table of quantiles, probably from lognormal distribution
>>>>
>>>>   dput(temp)
>>>> temp <- structure(list(size = c(1.6, 0.9466, 0.8062, 0.6477,
0.5069,
>>>> 0.3781, 0.3047, 0.2681, 0.1907), percent = c(0.01, 0.05, 0.1,
>>>> 0.25, 0.5, 0.75, 0.9, 0.95, 0.99)), .Names =
c("size", "percent"
>>>> ), row.names = c(NA, -9L), class = "data.frame")
>>>>
>>>> and I need to calculate quantile for size 0.1
>>>>
>>>> plot(temp$size, temp$percent, pch=19, xlim=c(0,2))
>>>> ss <- approxfun(temp$size, temp$percent)
>>>> points((0:100)/50, ss((0:100)/50))
>>>> abline(v=.1)
>>>>
>>>> If I had original data it would be quite easy with
ecdf/quantile function but without it I am lost what function I could use for
such task.
>>>>
>>>> Please, give me some hint where to look.
>>>>
>>>>
>>>> Best regards
>>>>
>>>> Petr
>>>> Osobn? ?daje: Informace o zpracov?n? a ochran? osobn?ch ?daj?
obchodn?ch partner? PRECHEZA a.s. jsou zve?ejn?ny na:
https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about
processing and protection of business partner's personal data are available
on website: https://www.precheza.cz/en/personal-data-protection-principles/
>>>> D?v?rnost: Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty
jsou d?v?rn? a podl?haj? tomuto pr?vn? z?vazn?mu prohl??en? o vylou?en?
odpov?dnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents
attached to it may be confidential and are subject to the legally binding
disclaimer: https://www.precheza.cz/en/01-disclaimer/
>>>>
>>>>
>>>>          [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Mar 2021 - quantile from quantile table calculation without original data

[R] quantile from quantile table calculation without original data

[R] quantile from quantile table calculation without original data