protoplast
2012-Feb-16 16:43 UTC
[R] how to get r-squared for a predefined curve or function with "other" data points
hello mailing list! i still consider myself an R beginner, so please bear with me if my questions seems strange. i'm in the field of biology, and have done consecutive hydraulic conductivity measurements in three parallels ("Sample"), resulting in three sets of conductivity values ("PLC" for percent loss of conductivity, relative to 100%) at multiple pressures ("MPa"). --- Sample MPa PLC 1 -0.3498324 0.000000 1 -1.2414770 15.207821 1 -1.7993249 23.819995 1 -3.0162866 33.598570 1 -3.5184321 46.376933 1 -3.9899791 67.532226 1 -4.2731145 89.735541 1 -4.7597244 99.836239 2 -0.2754036 0.000000 2 -1.2912619 12.476132 2 -1.5128974 13.543273 ... --- since each sample is a statistical unit, i have fitted each sample-subset to a sigmoid curve: --- plot( NA, NA, main="", xlim=c(-20,0), ylim=c(0,100), xlab = "water potential [MPa]", ylab = "percent loss of conductivity [%]", xaxp = c(0,-20,4), yaxp = c(0,100,5), tck = 0.02, mgp = c(1.5,0.1,0), ) for(i in 1:3){ x <- subset(curvedata,Group == i)$MPa y <- subset(curvedata,Group == i)$PLC name <- subset(curvedata,Group == i)$Sample points(x,y) vlc <- nls(y ~ 100/(1+exp(a*(x-b))), start=c(a=1, b=-3), data=list(x,y)) curve(100/(1+exp(coef(vlc)[1]*(x-coef(vlc)[2]))), col=1, add = TRUE) Rsquared <- 1 - var(residuals(vlc))/var(y) summarizeall[i ,"Run"] <- i summarizeall[i ,"Sample"] <- name[1] summarizeall[i ,"a"] <- coef(vlc)[1] summarizeall[i ,"b"] <- coef(vlc)[2] summarizeall[i ,"R2"] <- Rsquared listnow <- data.frame(list(Run = c(i),Sample = c(name[1]), a c(coef(vlc)[1]), b = c(coef(vlc)[2]), R2 = c(Rsquared))) print(listnow) i <- i+1 } --- ...and get three slightly different curves with three different estimatinos of fit (r?, Rsquared). ---> summarizeallSample a b R2 1 1 1.388352 -3.277755 0.9379886 2 2 1.800007 -3.363075 0.9327164 3 3 1.736857 -2.743972 0.9882998> averageVar n a b R2 1 Mean 3 1.6417389 -3.1282673 NA 2 SE . 0.1279981 0.1937197 NA --- by averaging parameters a and b of the curve, i create a "mean curve" that is added to the plot (red curve in the attached image). http://r.789695.n4.nabble.com/file/n4394609/conductivity-curve.gif --- meana <- average[1,"a"] meanb <- average[1,"b"] curve(100/(1+exp(meana*(x-meanb))), col=2, lwd=2, add = TRUE) --- and now here's my problem: i'd like to calculate R squared for all points on that mean curve. since i have to average the curve parameters, i loose the curve's residuals that are stored in my variable vlc (the result of the nls function) for every sample. just fitting one curve to all the data points is not good enough. an extensive google search over several days hasn't gotten me anywhere, but maybe someone here can help me? is there an efficient way to calculate r squared for a predefined function with "unrelated" data points? (unrelated as in "not used directly for fitting") thanks in advance markus -- View this message in context: http://r.789695.n4.nabble.com/how-to-get-r-squared-for-a-predefined-curve-or-function-with-other-data-points-tp4394609p4394609.html Sent from the R help mailing list archive at Nabble.com.
David Winsemius
2012-Feb-16 22:16 UTC
[R] how to get r-squared for a predefined curve or function with "other" data points
On Feb 16, 2012, at 11:43 AM, protoplast wrote:> hello mailing list! > i still consider myself an R beginner, so please bear with me if my > questions seems strange. > > i'm in the field of biology, and have done consecutive hydraulic > conductivity measurements in three parallels ("Sample"), resulting > in three > sets of conductivity values ("PLC" for percent loss of conductivity, > relative to 100%) at multiple pressures ("MPa"). > > --- > Sample MPa PLC > 1 -0.3498324 0.000000 > 1 -1.2414770 15.207821 > 1 -1.7993249 23.819995 > 1 -3.0162866 33.598570 > 1 -3.5184321 46.376933 > 1 -3.9899791 67.532226 > 1 -4.2731145 89.735541 > 1 -4.7597244 99.836239 > 2 -0.2754036 0.000000 > 2 -1.2912619 12.476132 > 2 -1.5128974 13.543273 > ... > --- > > since each sample is a statistical unit, i have fitted each sample- > subset to > a sigmoid curve: > > --- > plot( > NA, > NA, > main="", > xlim=c(-20,0), > ylim=c(0,100), > xlab = "water potential [MPa]", > ylab = "percent loss of conductivity [%]", > xaxp = c(0,-20,4), > yaxp = c(0,100,5), > tck = 0.02, > mgp = c(1.5,0.1,0), > ) > > for(i in 1:3){ > x <- subset(curvedata,Group == i)$MPa > y <- subset(curvedata,Group == i)$PLC > name <- subset(curvedata,Group == i)$Sample > points(x,y) > > vlc <- nls(y ~ 100/(1+exp(a*(x-b))), start=c(a=1, b=-3), > data=list(x,y)) > > curve(100/(1+exp(coef(vlc)[1]*(x-coef(vlc)[2]))), col=1, add = TRUE) > > Rsquared <- 1 - var(residuals(vlc))/var(y) > > summarizeall[i ,"Run"] <- i > summarizeall[i ,"Sample"] <- name[1] > summarizeall[i ,"a"] <- coef(vlc)[1] > summarizeall[i ,"b"] <- coef(vlc)[2] > summarizeall[i ,"R2"] <- Rsquared > > listnow <- data.frame(list(Run = c(i),Sample = c(name[1]), a > c(coef(vlc)[1]), b = c(coef(vlc)[2]), R2 = c(Rsquared))) > print(listnow) > > i <- i+1 > } > --- > > ...and get three slightly different curves with three different > estimatinos > of fit (r?, Rsquared). > > --- >> summarizeall > Sample a b R2 > 1 1 1.388352 -3.277755 0.9379886 > 2 2 1.800007 -3.363075 0.9327164 > 3 3 1.736857 -2.743972 0.9882998 > >> average > Var n a b R2 > 1 Mean 3 1.6417389 -3.1282673 NA > 2 SE . 0.1279981 0.1937197 NA > --- > > by averaging parameters a and b of the curve, i create a "mean > curve" that > is added to the plot (red curve in the attached image). > > http://r.789695.n4.nabble.com/file/n4394609/conductivity-curve.gif > > --- > meana <- average[1,"a"] > meanb <- average[1,"b"] > curve(), col=2, lwd=2, add = TRUE) > --- > > and now here's my problem: > i'd like to calculate R squared for all points on that mean curve. > since i have to average the curve parameters, i loose the curve's > residuals > that are stored in my variable vlc (the result of the nls function) > for > every sample. > just fitting one curve to all the data points is not good enough.So just calculate them? # pseudo-code: residual= actual - predicted gresid <- curvedata$PLC - 100/(1+exp(meana*(curvedata$MPa-meanb)) If you are convinced that your formula for R^2 makes sense and this practice is generally accepted in your domain, then you can apply it across the whole dataset. I would have thought that a single regression model built with nlmer might have been more statistically sound. (But this is a bit outside my domain of comfort for giving advice.)> > an extensive google search over several days hasn't gotten me > anywhere, but > maybe someone here can help me? > > is there an efficient way to calculate r squared for a predefined > function > with "unrelated" data points? > (unrelated as in "not used directly for fitting") > > > thanks in advance > markusDavid Winsemius, MD West Hartford, CT