Hello, I have a question about using extreme value distribution in R. I have two variables, X and Y, and have pairs of points (X1,Y1),(X2,Y2), (X3,Y3) etc. When I plot X against Y, it looks like the maximum value of Y (for a particular X) is correlated with X. Indeed, when I bin the data by X-value into equally sized bins, and test whether the maximum value of Y for a bin is correlated with the mean X for the bin, there is a significant correlation between max(Y) and X. However, I am not very happy with this because there is not an equal number of data points in each bin. I imagine that there is a better statistical test that I could use, if I could fit an extreme value distribution to the Y data. However, I'm not sure how to do this. I am wondering is there a way to use the extreme value distribution functions in R to test the hypothesis that the maximum of Y (for a particular X) is correlated with X? I would appreciate advice very much. regards Avril Coghlan -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
As an alternative you could try quantile regression (find a regression line for the 95th percentile), if the relationship between max(y) and x is only due to more points (and therefore more oportunities for large values) then the estimated quantile lines should not differ significantly from 0. Try the following (where the truth is no relationship): set.seed(2) x <- rep(1:10, 1:10) y <- rexp(55, 1) plot(x,y) cor.test( 1:10, tapply( y, x, max ) ) library(quantreg) fit <- rq( y ~ x, tau=c(0.5, 0.95) ) summary(fit) Notice that the cor.test is significant, but that the confidence intervals for the slopes both include 0. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at intermountainmail.org (801) 408-8111> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Avril Coghlan > Sent: Thursday, November 22, 2007 5:46 AM > To: r-help at stat.math.ethz.ch > Cc: alc at sanger.ac.uk > Subject: [R] question about extreme value distribution > > Hello, > > I have a question about using extreme > value distribution in R. > > I have two variables, X and Y, and have pairs of points > (X1,Y1),(X2,Y2), (X3,Y3) etc. > When I plot X against Y, it looks > like the maximum value of Y (for a particular X) is correlated with X. > > Indeed, when I bin the data by X-value into equally sized > bins, and test whether the maximum value of Y for a bin is > correlated with the mean X for the bin, there is a > significant correlation between max(Y) and X. However, I am > not very happy with this because there is not an equal number > of data points in each bin. > > I imagine that there is a better statistical test that I > could use, if I could fit an extreme value distribution to the Y data. > However, I'm not sure how to do this. > I am wondering is there a way to use the extreme value > distribution functions in R to test the hypothesis that the > maximum of Y (for a particular X) is correlated with X? > > I would appreciate advice very much. > > regards > Avril Coghlan > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome > Research Limited, a charity registered in England with > number 1021457 and a company registered in England with > number 2742969, whose registered office is 215 Euston Road, > London, NW1 2BE. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Dear Greg, thank you for your reply - that is very helpful! regards Avril "Greg Snow" <Greg.Snow@intermountainmail.org> wrote:> As an alternative you could try quantile regression (find a regression > line for the 95th percentile), if the relationship between max(y) and x > is only due to more points (and therefore more oportunities for large > values) then the estimated quantile lines should not differ > significantly from 0. > > Try the following (where the truth is no relationship): > > set.seed(2) > > x <- rep(1:10, 1:10) > y <- rexp(55, 1) > > plot(x,y) > > cor.test( 1:10, tapply( y, x, max ) ) > > library(quantreg) > fit <- rq( y ~ x, tau=c(0.5, 0.95) ) > summary(fit) > > > Notice that the cor.test is significant, but that the confidence > intervals for the slopes both include 0. > > Hope this helps, > > -- > Gregory (Greg) L. Snow Ph.D. > Statistical Data Center > Intermountain Healthcare > greg.snow@intermountainmail.org > (801) 408-8111 > > > > > -----Original Message----- > > From: r-help-bounces@r-project.org > > [mailto:r-help-bounces@r-project.org] On Behalf Of Avril Coghlan > > Sent: Thursday, November 22, 2007 5:46 AM > > To: r-help@stat.math.ethz.ch > > Cc: alc@sanger.ac.uk > > Subject: [R] question about extreme value distribution > > > > Hello, > > > > I have a question about using extreme > > value distribution in R. > > > > I have two variables, X and Y, and have pairs of points > > (X1,Y1),(X2,Y2), (X3,Y3) etc. > > When I plot X against Y, it looks > > like the maximum value of Y (for a particular X) is correlated with X. > > > > Indeed, when I bin the data by X-value into equally sized > > bins, and test whether the maximum value of Y for a bin is > > correlated with the mean X for the bin, there is a > > significant correlation between max(Y) and X. However, I am > > not very happy with this because there is not an equal number > > of data points in each bin. > > > > I imagine that there is a better statistical test that I > > could use, if I could fit an extreme value distribution to the Y data. > > However, I'm not sure how to do this. > > I am wondering is there a way to use the extreme value > > distribution functions in R to test the hypothesis that the > > maximum of Y (for a particular X) is correlated with X? > > > > I would appreciate advice very much. > > > > regards > > Avril Coghlan > > > > > > > > > > > > > > > > -- > > The Wellcome Trust Sanger Institute is operated by Genome > > Research Limited, a charity registered in England with > > number 1021457 and a company registered in England with > > number 2742969, whose registered office is 215 Euston Road, > > London, NW1 2BE. > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > >
Hello, I have a question about R, and will be very grateful for any help. I have two variables X and Y, and think that Y is related to X by a function of the form : Y = X^Z, where Z is < 1. However, I'm not sure how to find the best-fit equation to fit my data to a curve of this form using R. Have you any ideas? regards Avril Coghlan Wellcome Trust Sanger Institute, Cambridge, UK -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.