Hello all, How do I actually use the output of predict.lm(..., type="terms") to predict new term values from new response values? I'm a chromatographer trying to use R (2.15.1) for one of the most common calculations in that business: - Given several chromatographic peak areas measured for control samples containing a molecule at known (increasing) concentrations, first derive a linear regression model relating the known concentration (predictor) to the observed peak area (response) - Then, given peak areas from new (real) samples containing unknown amounts of the molecule, use the model to predict concentrations of the molecule in the unknowns. In other words, given y = mx +b, I need to solve x' = (y'-b)/m for new data y' and in R, I'm trying something like this require(stats) data <- data.frame(area = c(4875, 8172, 18065, 34555), concn = c(25, 50, 125, 250)) new <- data.frame(area = c(8172, 10220, 11570, 24150)) model <- lm(area ~ concn, data) pred <- predict(model, type = "terms") #predicts from original data pred <- predict(model, type = "terms", newdata = new) #error pred <- predict(model, type = "terms", newdata = new, se.fit = TRUE) #error pred <- predict(model, type = "terms", newdata = new, interval "prediction") #error new2 <- data.frame(area = c(8172, 10220, 11570, 24150), concn = 0) new2 pred <- predict(model, type = "terms", newdata = new2) #wrong results Can someone please show me what I'm doing wrong?
Hello, You seem to be misreading the help pages for lm and predict.lm, argument 'terms'. A much simpler way of solving your problem should be to invert the fitted model using lm(): model <- lm(area ~ concn, data) # Your original model inv.model <- lm(concn ~ area, data = data) # Your problem's model. # predicts from original data pred1 <- predict(inv.model) # predict from new data pred2 <- predict(inv.model, newdata = new) # Let's see it. plot(concn ~ area, data = data) abline(inv.model) points(data$area, pred1, col="blue", pch="+") points(new$area, pred2, col="red", pch=16) Also, 'data' is a really bad variable name, it's already an R function. Hope this helps, Rui Barradas Em 28-08-2012 23:30, John Thaden escreveu:> Hello all, > > How do I actually use the output of predict.lm(..., type="terms") to > predict new term values from new response values? > > I'm a chromatographer trying to use R (2.15.1) for one of the most > common calculations in that business: > > - Given several chromatographic peak areas measured for control > samples containing a molecule at known (increasing) concentrations, > first derive a linear regression model relating the known > concentration (predictor) to the observed peak area (response) > - Then, given peak areas from new (real) samples containing > unknown amounts of the molecule, use the model to predict > concentrations of the > molecule in the unknowns. > > In other words, given y = mx +b, I need to solve x' = (y'-b)/m for new data y' > > and in R, I'm trying something like this > > require(stats) > data <- data.frame(area = c(4875, 8172, 18065, 34555), concn = c(25, > 50, 125, 250)) > new <- data.frame(area = c(8172, 10220, 11570, 24150)) > model <- lm(area ~ concn, data) > pred <- predict(model, type = "terms") > #predicts from original data > pred <- predict(model, type = "terms", newdata = new) > #error > pred <- predict(model, type = "terms", newdata = new, se.fit = TRUE) > #error > pred <- predict(model, type = "terms", newdata = new, interval > "prediction") #error > new2 <- data.frame(area = c(8172, 10220, 11570, 24150), concn = 0) > new2 > pred <- predict(model, type = "terms", newdata = new2) > #wrong results > > Can someone please show me what I'm doing wrong? > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Draper & Smith sections (3.2, 9.6) address prediction interval issues, but I'm also concerned with the linear fit itself. The Model II regression literature makes it abundantly clear that OLS regression of x on y frequently yields a different line than of y on x. The example below is not so extreme, but those given e.g. by Ludbrook, J. (2012) certainly are. Rui notes the logical problem of imputing an unknown x using a calibration curve where the x values are without error. Regression x on y doesn't help that. But on a practical level, I definitely recall (years ago) using predict.lm and newdata to predict x terms. I wish I remembered how. require(stats) #Make an illustrative data set set.seed(seed = 1111) dta <- data.frame( area = c( rnorm(6, mean = 4875, sd = 400), rnorm(6, mean = 8172, sd = 800), rnorm(6, mean = 18065, sd = 1200), rnorm(6, mean = 34555, sd = 2000)), concn = rep(c(25, 50, 125, 250), each = 6)) model <- lm(area ~ concn, data = dta) inv.model <- lm(concn ~ area, data = dta) plot(area ~ concn, data = dta) abline(model) inv.new = cbind.data.frame(area = c(1600, 34000)) inv.pred <- predict(inv.model, newdata = inv.new) lines(x = inv.pred, y = unlist(inv.new), col = "red") _____________________________ Ludbrook, J. (2012). "A primer for biomedical scientists on how to execute Model II linear regression analysis." Clinical and Experimental Pharmacology and Physiology 39(4): 329-335. [[alternative HTML version deleted]]
On Aug 29, 2012, at 4:08 PM, John Thaden wrote:> Draper & Smith sections (3.2, 9.6) address prediction interval > issues, but > I'm also concerned with the linear fit itself. The Model II regression > literatureCitations?> makes it abundantly clear that OLS regression of x on y > frequently yields a different line than of y on x. The example below > is not > so extreme, but those given e.g. by Ludbrook, J. (2012)... to a source that does not require a fee of US$30?> certainly are. Rui > notes the logical problem of imputing an unknown x using a calibration > curve where the x values are without error. Regression x on y > doesn't help > that. But on a practical level, I definitely recall (years ago) using > predict.lm and newdata to predict x terms. I wish I remembered how.You may be thinking of "orthogonal regression" or or "Deming regression" or one of its many other names. There is abundant code in the (free) archives regarding this topic. Please do be clear about what you really do desire. If you think there is a correct answer to your problem in a particular case, then by all means ... post that. Just saying predict)lm)...)) is not right, is not sufficient. -- David> > > require(stats) > #Make an illustrative data set > set.seed(seed = 1111) > dta <- data.frame( > area = c( > rnorm(6, mean = 4875, sd = 400), > rnorm(6, mean = 8172, sd = 800), > rnorm(6, mean = 18065, sd = 1200), > rnorm(6, mean = 34555, sd = 2000)), > concn = rep(c(25, 50, 125, 250), each = 6)) > model <- lm(area ~ concn, data = dta) > inv.model <- lm(concn ~ area, data = dta) > plot(area ~ concn, data = dta) > abline(model) > inv.new = cbind.data.frame(area = c(1600, 34000)) > inv.pred <- predict(inv.model, newdata = inv.new) > lines(x = inv.pred, y = unlist(inv.new), col = "red") > > _____________________________ > Ludbrook, J. (2012). "A primer for biomedical scientists on how to > execute > Model II linear regression analysis." Clinical and Experimental > Pharmacology and Physiology 39(4): 329-335. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Alameda, CA, USA
Hello, Instead of reversing the regression, that, like you say, may have problems, it's very easy to wrap the formula x' <- (y' - beta0)/beta1 in a function and use the direct regression to get new 'x' values from new 'y' ones. This function assumes a first order ols model. invpredict <- function(model, newdata){ if(class(newdata) %in% c("data.frame", "list")) newdata <- unlist(newdata) cc <- coef(model) x <- (newdata - cc[1])/cc[2] x } # Using your last data example inv2 <- invpredict(model, c(1600, 34000)) cbind(new=c(1600, 34000), inv.pred, inv2) new inv.pred inv2 1 1600 0.5091916 -0.5980076 2 34000 244.4645607 245.7692308 I wonder what would be the CI's for the predicted 'concn'. Rui Barradas Em 30-08-2012 00:08, John Thaden escreveu:> Draper & Smith sections (3.2, 9.6) address prediction interval issues, but > I'm also concerned with the linear fit itself. The Model II regression > literature makes it abundantly clear that OLS regression of x on y > frequently yields a different line than of y on x. The example below is not > so extreme, but those given e.g. by Ludbrook, J. (2012) certainly are. Rui > notes the logical problem of imputing an unknown x using a calibration > curve where the x values are without error. Regression x on y doesn't help > that. But on a practical level, I definitely recall (years ago) using > predict.lm and newdata to predict x terms. I wish I remembered how. > > > require(stats) > #Make an illustrative data set > set.seed(seed = 1111) > dta <- data.frame( > area = c( > rnorm(6, mean = 4875, sd = 400), > rnorm(6, mean = 8172, sd = 800), > rnorm(6, mean = 18065, sd = 1200), > rnorm(6, mean = 34555, sd = 2000)), > concn = rep(c(25, 50, 125, 250), each = 6)) > model <- lm(area ~ concn, data = dta) > inv.model <- lm(concn ~ area, data = dta) > plot(area ~ concn, data = dta) > abline(model) > inv.new = cbind.data.frame(area = c(1600, 34000)) > inv.pred <- predict(inv.model, newdata = inv.new) > lines(x = inv.pred, y = unlist(inv.new), col = "red") > > _____________________________ > Ludbrook, J. (2012). "A primer for biomedical scientists on how to execute > Model II linear regression analysis." Clinical and Experimental > Pharmacology and Physiology 39(4): 329-335. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.