Johannes Radinger
2011-Sep-07 10:17 UTC
[R] linear regression, log-transformation and plotting
Hello, I've some questions concerning log-transformations and plotting of the regression lines. So far as I know is it a problem to log-transform values smaller than 1 (0-1). In my statistics lecture I was told to do a log(x+1) transformation in such cases. So I provide here a small example to explain my questions: # Some example data for testing a1 <-c(0.2,1.9,0.1,0.2,0.8,22,111.3,19.9,23.9,138,42.3,54.2,0.9) b1 <-c(1.8,28.2,0.3,12.4,3.2,81.1,122.1,2.9,37.2,98.9,21,28.7,1.8) data1 <- data.frame(a1,b1) model <- lm(log(a1+1)~log(b1+1)) because of values less then one I did the log(x+1) transformation for running the lm. Is that correct so far? (Just to mention: These are example data so I haven't checked if the need a transformation at all) Then some questions arise when it comes to plot the data. As usual I'd like to plot the original data (not log transformed) but in a log-scale. I tried two approaches the standard plot function and ggplot. # Plot with ggplot ggplot()+ geom_point(aes(b1,a1,data=data1))+ geom_abline(aes(intercept=coef(model)[1],slope=coef(model)[2]))+ scale_y_log()+ scale_x_log() # Plot with standard plot plot(b1,a1,log="xy") abline(model,untf=T) abline(model,untf=F) 1) The regression lines are different for plot vs. ggplot(transformed or untransformed). So what is actually the correct line? 2) The regression line was calculated on basis of log(x+1), but the log scale on my axis is just simple log (without +1). So how are such cases usually treated? I thought about subtracting the value 1 from the intercept? So my simple question: What is the best way to display such data with a regression line? Thank you /Johannes --
I think that you have not understood your lecturer. You can log transform any positive number. You can not log transform a negative number. Adding a constant to a negative number to make it constant before log transformation is sometimes suggested by those who do not understand what they are doing. This practice is not appropriate except perhaps in some extraordinary circumstances. John On Wednesday, 7 September 2011, Johannes Radinger <JRadinger@gmx.at> wrote:> Hello, > > I've some questions concerning log-transformations and plotting of theregression lines. So far as I know is it a problem to log-transform values smaller than 1 (0-1). In my statistics lecture I was told to do a log(x+1) transformation in such cases. So I provide here a small example to explain my questions:> > > # Some example data for testing > a1 <-c(0.2,1.9,0.1,0.2,0.8,22,111.3,19.9,23.9,138,42.3,54.2,0.9) > b1 <-c(1.8,28.2,0.3,12.4,3.2,81.1,122.1,2.9,37.2,98.9,21,28.7,1.8) > data1 <- data.frame(a1,b1) > > model <- lm(log(a1+1)~log(b1+1)) > > > because of values less then one I did the log(x+1) transformation forrunning the lm. Is that correct so far? (Just to mention: These are example data so I haven't checked if the need a transformation at all)> > Then some questions arise when it comes to plot the data. As usual I'dlike to plot the original data (not log transformed) but in a log-scale.> > I tried two approaches the standard plot function and ggplot. > > # Plot with ggplot > ggplot()+ > geom_point(aes(b1,a1,data=data1))+ > geom_abline(aes(intercept=coef(model)[1],slope=coef(model)[2]))+ > scale_y_log()+ > scale_x_log() > > # Plot with standard plot > plot(b1,a1,log="xy") > abline(model,untf=T) > abline(model,untf=F) > > > 1) The regression lines are different for plot vs. ggplot(transformed oruntransformed). So what is actually the correct line?> > 2) The regression line was calculated on basis of log(x+1), but the logscale on my axis is just simple log (without +1). So how are such cases usually treated? I thought about subtracting the value 1 from the intercept?> > So my simple question: What is the best way to display such data with aregression line?> > Thank you > /Johannes > -- > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. >-- John C Frain Economics Department Trinity College Dublin Dublin 2 Ireland www.tcd.ie/Economics/staff/frainj/home.html mailto:frainj@tcd.ie mailto:frainj@gmail.com [[alternative HTML version deleted]]