pappnase
2010-Feb-08 16:09 UTC
[R] confidence interval for negatively skewed, leptokurtic sample
Hello, I?ve got a statistical problem that I hope you can help me with. It doesn?t have to do directly with R, so if there?s another forum which would suit better, please tell me! Now here?s the problem: I want to derive confidence intervals for a variable X, which is - given the descriptive statistics - obviously negatively skewed and leptokurtic (i.e. peaked). My aim is to make a statement similar to this one: Given certain values of the two explaining variables (see below), X will range from value A to value B when drawing again from the same parent population. The dataset that I?m using is pretty huge, it contains some 150,000 cases and it can be seen as a sample out of the basic population which covers an entire year (the sample). The variable I?m interested in is the prediction error X of a given (daily computed) wind power forecast which I compute as a difference of the prediction value and the respective realisation value. To make things clearer: the predictions yielding my data are calculated once a day, and they cover three days so that there are three prediction values for each realisation value. Unfortunately, there is autocorrelation in the dataset because the there is data for every quarter of an hour. That?s why I have to select some cases at random (at least I think so). Second, and more important, I want to classify the data in order to process the available information about a dependence of X from the two explaining variables "prediction horizon" and "prediction level", i.e. the level of the predited power output in relation to the maximum power output, the latter also called nominal power or rated power. That?s why the sample I want to analyse is reduced down to about 300 cases. As the mean of X is unsurprisingly always close to zero, I want to gather information about the dispersion of X as a function of the explaining variables. A regression however doesn?t seem appropriate to me because the resulting confidence intervals of X subject to the explaining variables would blur a lot of information hidden in the dataset (i.e. a stronger dispersion for daytime predictions). That?s why I thought a classification would meet my needs best. My first aim is now to get some information about the standard deviation or the variance of the parent population of X. I thought about bootstrapping: drawing various samples from the same basic population would enable me to calculate a confidence interval for the parameter of interest, i.e. the standard deviation. Do you think that?s a suitable approach? I?m currently using PASW (former SPSS) which is obviously not a very powerful software, but I have access to Stata computers, too. Assuming that I receive a confidence interval for, say, the standard deviation, then the next problem arises: the distribution of X is still negatively skewed and leptokurtic, so how can I anyhow derive a confidence interval for X? Summing and subtracting the standard deviation multiplied by 1.96 would result in a symmetric confidence interval which is probably wrong. It would be great if someone could help me with this. I?m not making any progress at the moment... Best, Andreas -- View this message in context: http://n4.nabble.com/confidence-interval-for-negatively-skewed-leptokurtic-sample-tp1473062p1473062.html Sent from the R help mailing list archive at Nabble.com.