Hello, I've two data.frames (data1 and data4), dec="." and sep=";". http://r.789695.n4.nabble.com/file/n4199964/data1.txt data1.txt http://r.789695.n4.nabble.com/file/n4199964/data4.txt data4.txt When I do plot(data1$nx,data1$ny, col="red") points(data4$nx,data4$ny, col="blue") , results seem very similar (at least to me) but the R-squared of summary(lm(data1$ny ~ data1$nx)) and summary(lm(data4$ny ~ data4$nx)) are very different (0.48 against 0.89). Could someone explain me the reason? To be complete, I am looking for an simple indicator telling me if it is worthwhile to keep the values provided by lm. I thought that R-squared could do the job. For me, if R-squared is far from 1, the data are not good enough to perform a linear fit. It seems that I'm wrong. Thanks for your explainations. Ptit Bleu. -- View this message in context: http://r.789695.n4.nabble.com/lm-and-R-squared-newbie-tp4199964p4199964.html Sent from the R help mailing list archive at Nabble.com.
On Thu, Dec 15, 2011 at 8:35 AM, PtitBleu <ptit_bleu at yahoo.fr> wrote:> Hello, > > I've two data.frames (data1 and data4), dec="." and sep=";". > http://r.789695.n4.nabble.com/file/n4199964/data1.txt data1.txt > http://r.789695.n4.nabble.com/file/n4199964/data4.txt data4.txt > > When I do > plot(data1$nx,data1$ny, col="red") > points(data4$nx,data4$ny, col="blue") > , ?results seem very similar (at least to me) but the R-squared of > summary(lm(data1$ny ~ data1$nx)) > and > summary(lm(data4$ny ~ data4$nx)) > are very different (0.48 against 0.89). > > Could someone explain me the reason? > > To be complete, I am looking for an simple indicator telling me if it is > worthwhile to keep the values provided by lm. I thought that R-squared could > do the job. For me, if R-squared is far from 1, the data are not good enough > to perform a linear fit. > It seems that I'm wrong.The problem is the outliers. Try using a robust measure instead. If we replace Pearson correlations with Spearman (rank) correlations they are much closer:> # R^2 based on Pearson correlations > cor(fitted(lm(ny ~ nx, data4)), data4$ny)^2[1] 0.8916924> cor(fitted(lm(ny ~ nx, data1)), data1$ny)^2[1] 0.4868575> > # R^2 based on Spearman (rank) correlations > cor(fitted(lm(ny ~ nx, data4)), data4$ny, method = "spearman")^2[1] 0.8104026> cor(fitted(lm(ny ~ nx, data1)), data1$ny, method = "spearman")^2[1] 0.7266705 -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
On Dec 15, 2011, at 8:35 AM, PtitBleu wrote:> Hello, > > I've two data.frames (data1 and data4), dec="." and sep=";". > http://r.789695.n4.nabble.com/file/n4199964/data1.txt data1.txt > http://r.789695.n4.nabble.com/file/n4199964/data4.txt data4.txt > > When I do > plot(data1$nx,data1$ny, col="red") > points(data4$nx,data4$ny, col="blue") > , results seem very similar (at least to me) but the R-squared of > summary(lm(data1$ny ~ data1$nx)) > and > summary(lm(data4$ny ~ data4$nx)) > are very different (0.48 against 0.89). > > Could someone explain me the reason?Because you failed to do an adequate assessment of your data. Try this ploting exercsie and I think you will see the reason for the differences: plot(data1$nx,data1$ny, col="red", xlim=range(c(data1$nx,data4$nx)), ylim=range(c(data1$ny,data4$ny)) ) -- David.> > To be complete, I am looking for an simple indicator telling me if > it is > worthwhile to keep the values provided by lm. I thought that R- > squared could > do the job. For me, if R-squared is far from 1, the data are not > good enough > to perform a linear fit. > It seems that I'm wrong. > > Thanks for your explainations. > Ptit Bleu. > > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/lm-and-R-squared-newbie-tp4199964p4199964.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT