Ann Huxtable
2004-Nov-13 14:27 UTC
[R] determing the distribution of a sample data set etc..
Hello, I have only recently started using R. I have two data samples that I want to carry out some initial explorative data analysis to: i). Determine the distribution of the data ii). Determine whether both datasets are from the same distribution. I have managed to create unit probability histograms and created qqplots for the data. I have attached one of the qqplots. It is clear that the data is not from a normal distribution (it forms a convex curve underneath the straight line). the nature of the curve suggest the data is from either Chi-square or F distribution (if you think otherwise, I would appreciate your help in correcting my analysis). The point of this mail however, is how do I use R to: 1). Test if the data is from another distribution (F, Ch-Square etc.. ) 2). How can I check if the samples are drawn from the same distribution? many thanks in advance for your help. Ann
Prof Brian Ripley
2004-Nov-13 15:10 UTC
[R] determing the distribution of a sample data set etc..
On Sat, 13 Nov 2004, Ann Huxtable wrote:> Hello, > > I have only recently started using R. I have two data samples that I want to > carry out some initial explorative data analysis to: > > i). Determine the distribution of the data > ii). Determine whether both datasets are from the same distribution. > > I have managed to create unit probability histograms and created qqplots for > the data. I have attached one of the qqplots. It is clear that the data isNo plot made it to the list: see the posting guide for what attachments are allowed.> not from a normal distribution (it forms a convex curve underneath the > straight line). the nature of the curve suggest the data is from either > Chi-square or F distribution (if you think otherwise, I would appreciate your > help in correcting my analysis). > > The point of this mail however, is how do I use R to: > > 1). Test if the data is from another distribution (F, Ch-Square etc.. ) > 2). How can I check if the samples are drawn from the same distribution?I would use qqplots for both purposes. qqplot will plot one dataset against another: see its examples. It will also plot against another distribution: continuing that example qqplot(y, qt(ppoints(200), df=5)) You could also compare two samples via the ecdfs and the Kolmogorov-Smirnov test (examples in the MASS ch05.R script). But formal testing is not much help unless you know what sort of differences are interesting _a priori_ -- you would need enormous samples to distinguish a t_5 from a t_4, for example. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595