thr3ads.net - R help - [R] Statistical distribution not fitting [Jul 2015]

If this information is useful, please help other people find it:
Share via:

Boris Steipe

2015-Jul-22 20:50 UTC

[R] Statistical distribution not fitting

So - as you can see, your data can be modelled.

Now the interesting question is: what do you do with that knowledge. I know
nearly nothing about your domain, but given that the data looks log-normal, I am
curious abut the following:

 - Most of the events are in the small-loss category. But most of the damage is
done by the rare large losses. Is it even meaningful to guard against a single
1/1000 event? Shouldn't you be saying: my contingency funds need to be large
enough to allow survival of, say, a fiscal year with 99.9 % probability? This is
a very different question.

 - If a loss occurs, in what time do the funds need to be replenished? Do you
need to take series of events into account?

 - The model assumes that the data are independent. This is probably a poor (and
dangerous) assumption.

Cheers,
B.





On Jul 22, 2015, at 3:56 PM, Ben Bolker <bbolker at gmail.com> wrote:
> Amelia Marsh <amelia_marsh08 <at> yahoo.com> writes:
> 
> 
>> Hello!  (I dont know if I can raise this query here on this forum,
>> but I had already raised on teh finance forum, but have not received
>> any sugegstion, so now raising on this list. Sorry for the same. The
>> query is about what to do, if no statistical distribution is fitting
>> to data).
> 
>> I am into risk management and deal with Operatioanl risk. As a part
>> of BASEL II guidelines, we need to arrive at the capital charge the
>> banks must set aside to counter any operational risk, if it
>> happens. As a part of Loss Distribution Approach (LDA), we need to
>> collate past loss events and use these loss amounts. The usual
>> process as being practised in the industry is as follows -
> 
>> Using these historical loss amounts and using the various
>> statistical tests like KS test, AD test, PP plot, QQ plot etc, we
>> try to identify best statistical (continuous) distribution fitting
>> this historical loss data. Then using these estimated parameters
>> w.r.t. the statistical distribution, we simulate say 1 miliion loss
>> anounts and then taking appropriate percentile (say 99.9%), we
>> arrive at the capital charge.
> 
>> However, many a times, loss data is such that fitting of
>> distribution to loss data is not possible. May be loss data is
>> multimodal or has significant variability, making the fitting of
>> distribution impossible.  Can someone guide me how to deal with such
>> data and what can be done to simulate losses using this historical
>> loss data in R.
> 
> A skew-(log)-normal fit doesn't look too bad ... (whenever you
> have positive data that are this strongly skewed, log-transforming
> is a good step)
> 
> hist(log10(mydat),col="gray",breaks="FD",freq=FALSE)
> ## default breaks are much coarser:
> ##
hist(log10(mydat),col="gray",breaks="Sturges",freq=FALSE)
> lines(density(log10(mydat)),col=2,lwd=2)
> library(fGarch)
> ss <- snormFit(log10(mydat))
> xvec <- seq(2,6.5,length=101)
> lines(xvec,do.call(dsnorm,c(list(x=xvec),as.list(ss$par))),
>      col="blue",lwd=2)
> ## or try a skew-Student-t: not very different:
> ss2 <- sstdFit(log10(mydat))
> lines(xvec,do.call(dsstd,c(list(x=xvec),as.list(ss2$estimate))),
>      col="purple",lwd=2)
> 
> There are more flexible distributional families (Johnson,
> log-spline ...)
> 
> Multimodal data are a different can of worms -- consider
> fitting a finite mixture model ...
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Amelia Marsh

2015-Jul-23 06:05 UTC

head link

[R] Statistical distribution not fitting

Dear Sir,

Thanks for your great guidance. Made me realize that I need to think out of box.

As regards the low losses, BASEL guidelines do say to get rid of such low losses
which create noise in analysing the losses caused by Operational Loss events.
Its the right tail events do matter which represent low frequency high magnitude
nature losses.
But my client is so adamant about it, that although we have shown them research
papers about threshold limits which need to apply to arrive at some meaningful
analyses, he is insisting that we do include these low losses too and fit some
distribution.

Lastly using the command 



rsnorm(10000, mean = m, sd = s, xi = x) 

where m, s and x are the estimated parameters obtained from loss data. The usual
procedure is to arrange these simulated values in descending order and select an
observation representing (say 99.9%) and this is Value at Risk (VaR) which is
say 'p'.

My understanding is to this value 'p', I need to apply the
transformation 10^p to arrive at the value which is in line with my original
loss data. Am I right?

Thanks again sir for your great help. I have something to look ahead now. 


Regards

Amelia

_____________________________________________________________________________



On Thursday, 23 July 2015 2:20 AM, Boris Steipe <boris.steipe at
utoronto.ca> wrote:
So - as you can see, your data can be modelled.

Now the interesting question is: what do you do with that knowledge. I know
nearly nothing about your domain, but given that the data looks log-normal, I am
curious abut the following:

- Most of the events are in the small-loss category. But most of the damage is
done by the rare large losses. Is it even meaningful to guard against a single
1/1000 event? Shouldn't you be saying: my contingency funds need to be large
enough to allow survival of, say, a fiscal year with 99.9 % probability? This is
a very different question.

- If a loss occurs, in what time do the funds need to be replenished? Do you
need to take series of events into account?

- The model assumes that the data are independent. This is probably a poor (and
dangerous) assumption.

Cheers,
B.






On Jul 22, 2015, at 3:56 PM, Ben Bolker <bbolker at gmail.com> wrote:
> Amelia Marsh <amelia_marsh08 <at> yahoo.com> writes:
> 
> 
>> Hello!  (I dont know if I can raise this query here on this forum,
>> but I had already raised on teh finance forum, but have not received
>> any sugegstion, so now raising on this list. Sorry for the same. The
>> query is about what to do, if no statistical distribution is fitting
>> to data).
> 
>> I am into risk management and deal with Operatioanl risk. As a part
>> of BASEL II guidelines, we need to arrive at the capital charge the
>> banks must set aside to counter any operational risk, if it
>> happens. As a part of Loss Distribution Approach (LDA), we need to
>> collate past loss events and use these loss amounts. The usual
>> process as being practised in the industry is as follows -
> 
>> Using these historical loss amounts and using the various
>> statistical tests like KS test, AD test, PP plot, QQ plot etc, we
>> try to identify best statistical (continuous) distribution fitting
>> this historical loss data. Then using these estimated parameters
>> w.r.t. the statistical distribution, we simulate say 1 miliion loss
>> anounts and then taking appropriate percentile (say 99.9%), we
>> arrive at the capital charge.
> 
>> However, many a times, loss data is such that fitting of
>> distribution to loss data is not possible. May be loss data is
>> multimodal or has significant variability, making the fitting of
>> distribution impossible.  Can someone guide me how to deal with such
>> data and what can be done to simulate losses using this historical
>> loss data in R.
> 
> A skew-(log)-normal fit doesn't look too bad ... (whenever you
> have positive data that are this strongly skewed, log-transforming
> is a good step)
> 
> hist(log10(mydat),col="gray",breaks="FD",freq=FALSE)
> ## default breaks are much coarser:
> ##
hist(log10(mydat),col="gray",breaks="Sturges",freq=FALSE)
> lines(density(log10(mydat)),col=2,lwd=2)
> library(fGarch)
> ss <- snormFit(log10(mydat))
> xvec <- seq(2,6.5,length=101)
> lines(xvec,do.call(dsnorm,c(list(x=xvec),as.list(ss$par))),
>      col="blue",lwd=2)
> ## or try a skew-Student-t: not very different:
> ss2 <- sstdFit(log10(mydat))
> lines(xvec,do.call(dsstd,c(list(x=xvec),as.list(ss2$estimate))),
>      col="purple",lwd=2)
> 
> There are more flexible distributional families (Johnson,
> log-spline ...)
> 
> Multimodal data are a different can of worms -- consider
> fitting a finite mixture model ...
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Ben Bolker

2015-Jul-23 17:47 UTC

head link

[R] Statistical distribution not fitting

Amelia Marsh via R-help <r-help <at> r-project.org> writes:
> 
> Dear Sir,
> 
 [snip]
 > Lastly using the command rsnorm(10000, mean = m, sd = s, xi = x)
> where m, s and x are the estimated parameters obtained from loss
> data. The usual procedure is to arrange these simulated values in
> descending order and select an observation representing (say 99.9%)
> and this is Value at Risk (VaR) which is say 'p'.
> My understanding is to this value 'p', I need to apply the
> transformation 10^p to arrive at the value which is in line with my
> original loss data. Am I right?
 [snip; sorry to remove context, but Gmane doesn't like it]

(1) you can probably calculate the 0.999 quantile directly from
qsnorm(0.999, [params]) rather than by simulating ...
(2) ... I believe that my original example used log(), so you
would need to use exp() (not 10^x) to get back to the original scale ...
(3) ... if you're concerned about extreme events it would be
a very good idea to use the skew-t rather than the skew-Normal
(4) you should certainly consider Boris Steipe's concerns about
non-independence (although I have to admit that without more
information and further time/effort/thought I don't have any
simple suggestions how ...)

  cheers
    Ben Bolker

R help - Jul 2015 - Statistical distribution not fitting

[R] Statistical distribution not fitting

[R] Statistical distribution not fitting

[R] Statistical distribution not fitting