Marc Burgess
2019-Jul-01 11:24 UTC
[R] Using rms Package's Predict function with transformed variables
Hello, I would like to use the rms Package's Glm and Predict functions in order to use some of the additional details that they provide. I'm struggling to get them to work in cases where the formula contains multiple transformed versions of the same variable however. For example Y ~ x + x^2 + log(x+1). I have tried to do a fair bit of experimentation and research on this, but closest I have found is a stackoverflow question for which no complete answer was ever posted (https://stackoverflow.com/questions/42168371/rms-package-glm-how-to-add-indicators-in-the-formula). As the post indicated, it's possible to do the fit by creating explicit transformed variables, but then I can't come up with a way to get them to vary together correctly when using the Predict function. Any help would be appreciated --- Marc Burgess Example code: # Generate test data: library(rms) set.seed(100) testdata <- rnorm(10000, 50, 10) ltestdata <- log(testdata+1) testdata2 <- testdata^2 yvals <- rpois(10000, 10+ltestdata+testdata+testdata2) # Can fit using standard glm as follows: testfit1 <- glm(yvals ~ I(log(testdata+1)) + testdata + I(testdata^2), family=poisson()) # Trying different methods fail for rms Glm that don't work: ddist = datadist(testdata, ltestdata, testdata2) options(datadist="ddist") testfit2 <- Glm(yvals ~ I(log(testdata+1)) + testdata + I(testdata^2), family=poisson()) testfit3 <- Glm(yvals ~ asis(log(testdata+1)) + testdata + asis(testdata^2), family=poisson()) testfit3 <- Glm(yvals ~ asis(log(testdata+1)) + pol(testdata,2), family=poisson()) # It does work to use the explicit transformed variables: testfit4 <- Glm(yvals ~ ltestdata + testdata + testdata2, family=poisson()) # We can't then use Predict with testfit4 because it doesn't make sense to independently vary the 3 predictors plot(Predict(testfit4, testdata=30:50, fun="exp")) # ltestdata and testdata2 are being held constant plot(Predict(testfit4, testdata=30:50, ltestdata=log(30:50+1), fun="exp")) # Creates predictions for all combinations, most of which are impossible.