Paul Warren Simonin
2009-Feb-06 21:25 UTC
[R] Log transformation and -Inf values for use in glm()
Hello, I am writing regarding log transformation of data in a single matrix column, and subsequent use of these data in a glm model fit. I have a data matrix in which I am using the log function to transform the values. This transformation results in -Inf values in some places, though. I then receive an error when this matrix is used in the glm function, and would like to know this can be avoided. I have attempted several methods already including the use of na.exclue commands in the glm statement:> DistributionT<-glm(EarlyLn$yoyras~EarlyLn$temp,family=gaussian(link > = "identity"),na.exclude)I have also attempted to use the is.finite command: EarlyLn$yoyras<-EarlyLn[is.finite(EarlyLn$yoyras)==T,] I know another option would be to use a type of find and replace command to remove entire rows of the matrix that contain 0's (before log transformation) or -Inf (after transformation), but I do not know how this is done. Thank you for any advice or tips regarding conducting this transformation and feeding the data matrix into glm. Sincerely, Paul S.
Stephen Weigand
2009-Feb-08 00:59 UTC
[R] Log transformation and -Inf values for use in glm()
Paul, On Fri, Feb 6, 2009 at 3:25 PM, Paul Warren Simonin <Paul.Simonin at uvm.edu> wrote:> Hello, > I am writing regarding log transformation of data in a single matrix > column, and subsequent use of these data in a glm model fit. I have a data > matrix in which I am using the log function to transform the values. This > transformation results in -Inf values in some places, though. I then receive > an error when this matrix is used in the glm function, and would like to > know this can be avoided. > I have attempted several methods already including the use of na.exclue > commands in the glm statement: > >> DistributionT<-glm(EarlyLn$yoyras~EarlyLn$temp,family=gaussian(link >> "identity"),na.exclude) > > I have also attempted to use the is.finite command: > > EarlyLn$yoyras<-EarlyLn[is.finite(EarlyLn$yoyras)==T,] > > I know another option would be to use a type of find and replace command to > remove entire rows of the matrix that contain 0's (before log > transformation) or -Inf (after transformation), but I do not know how this > is done. > > Thank you for any advice or tips regarding conducting this transformation > and feeding the data matrix into glm. > > Sincerely, > Paul S.In general, use syntax like this: glm(yoyras ~ log(temp), data = EarlyLn, subset = temp > 0) However, it's bad statistical practice to use a transformation that causes you to lose data. One approach is to add a constant to temp via: glm(yoyras ~ log(temp + 1), data = EarlyLn, subset = temp > 0) with the disadvantage being that the constant you choose is arbitrary but affects your inferences. Stephen Rochester, MN USA