Tal Galili
2013-Nov-06 16:40 UTC
[R] Basic question: why does a scatter plot of a variable against itself works like this?
Hello all, I just noticed the following behavior of plot: x <- c(1,2,9) plot(x ~ x) # this is just like doing: plot(x) # when maybe we would like it to give this: plot(x ~ c(x)) # the same as: plot(x ~ I(x)) I was wondering if there is some reason for this behavior. Thanks, Tal ----------------Contact Details:------------------------------------------------------- Contact me: Tal.Galili@gmail.com | Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- [[alternative HTML version deleted]]
Marc Schwartz
2013-Nov-06 16:52 UTC
[R] Basic question: why does a scatter plot of a variable against itself works like this?
On Nov 6, 2013, at 10:40 AM, Tal Galili <tal.galili at gmail.com> wrote:> Hello all, > > I just noticed the following behavior of plot: > x <- c(1,2,9) > plot(x ~ x) # this is just like doing: > plot(x) > # when maybe we would like it to give this: > plot(x ~ c(x)) > # the same as: > plot(x ~ I(x)) > > I was wondering if there is some reason for this behavior. > > > Thanks, > TalHi Tal, In your example: plot(x ~ x) the formula method of plot() is called, which essentially does the following internally:> model.frame(x ~ x)x 1 1 2 2 3 9 Note that there is only a single column in the result. Thus, the plot is based upon 'y' = c(1, 2, 9), while 'x' = 1:3, which is NOT the row names for the resultant data frame, but the indices of the vector elements in the 'x' column. This is just like: plot(c(1, 2, 9)) On the other hand:> model.frame(x ~ c(x))x c(x) 1 1 1 2 2 2 3 9 9> model.frame(x ~ I(x))x I(x) 1 1 1 2 2 2 3 9 9 In both of the above cases, you get two columns of data back, thus the result is essentially: plot(c(1, 2, 9), c(1, 2, 9)) Regards, Marc Schwartz
William Dunlap
2013-Nov-06 16:59 UTC
[R] Basic question: why does a scatter plot of a variable against itself works like this?
It probably happens because plot(formula) makes one call to terms(formula) to analyze the formula. terms() says there is one variable in the formula, the response, so plot(x~x) is the same a plot(seq_along(x), x). If you give it plot(~x) , terms() also says there is one variable, but no response, so you get the same plot as plot(x, rep(1,length(x))). This is also the reason that plot(y1+y2 ~ x1+x2) makes one plot of the sum of y1 and y2 for each term on the right side instead of 4 plots, plot(x1,y1), plot(x1,y2),plot(x2,y1), and plot(x2,y2). One could write a plot function that called terms separately on the left and right sides of the formula. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf > Of Tal Galili > Sent: Wednesday, November 06, 2013 8:40 AM > To: r-help at r-project.org > Subject: [R] Basic question: why does a scatter plot of a variable against itself works like > this? > > Hello all, > > I just noticed the following behavior of plot: > x <- c(1,2,9) > plot(x ~ x) # this is just like doing: > plot(x) > # when maybe we would like it to give this: > plot(x ~ c(x)) > # the same as: > plot(x ~ I(x)) > > I was wondering if there is some reason for this behavior. > > > Thanks, > Tal > > > > ----------------Contact > Details:------------------------------------------------------- > Contact me: Tal.Galili at gmail.com | > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | > www.r-statistics.com (English) > ---------------------------------------------------------------------------------------------- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Barry Rowlingson
2013-Nov-06 17:38 UTC
[R] Basic question: why does a scatter plot of a variable against itself works like this?
Interestingly, fitting an LM with x on both sides gives a warning, and then drops it from the RHS, leaving you with just an intercept:> lm(x~x,data=d)Call: lm(formula = x ~ x, data = d) Coefficients: (Intercept) 4 Warning messages: 1: In model.matrix.default(mt, mf, contrasts) : the response appeared on the right-hand side and was dropped 2: In model.matrix.default(mt, mf, contrasts) : problem with term 1 in model.matrix: no columns are assigned there's no numerical problem fitting a line through the points: > d$xx=d$x > lm(x~xx,data=d) Call: lm(formula = x ~ xx, data = d) Coefficients: (Intercept) xx 5.128e-16 1.000e+00 It seems to be R saying "Ummm did you really mean to do this? It's kinda dumb". I suppose this could occur if you had a nested loop over all columns in a data frame, fitting an LM with every column, and didn't skip if i==j Except of course it doesn't: - fit with two indexes set to one:> i=1;j=1 > lm(d[,i]~d[,j])Call: lm(formula = d[, i] ~ d[, j]) Coefficients: (Intercept) d[, j] 5.128e-16 1.000e+00 - fit with two ones:> lm(d[,1]~d[,1])Call: lm(formula = d[, 1] ~ d[, 1]) Coefficients: (Intercept) 4 Warning messages: 1: In model.matrix.default(mt, mf, contrasts) : the response appeared on the right-hand side and was dropped 2: In model.matrix.default(mt, mf, contrasts) : problem with term 1 in model.matrix: no columns are assigned Obviously this can all be explained in terms of R (or lm's, or model.matrix's) evaluation schemes, but it seems far from intuitive. Barry On Wed, Nov 6, 2013 at 4:59 PM, William Dunlap <wdunlap at tibco.com> wrote:> It probably happens because plot(formula) makes one call to terms(formula) to > analyze the formula. terms() says there is one variable in the formula, > the response, so plot(x~x) is the same a plot(seq_along(x), x). > If you give it plot(~x) , terms() also says there is one variable, but > no response, so you get the same plot as plot(x, rep(1,length(x))). > This is also the reason that plot(y1+y2 ~ x1+x2) makes one plot of the sum of y1 and y2 > for each term on the right side instead of 4 plots, plot(x1,y1), plot(x1,y2),plot(x2,y1), > and plot(x2,y2). > > One could write a plot function that called terms separately on the left and > right sides of the formula. > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > >> -----Original Message----- >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf >> Of Tal Galili >> Sent: Wednesday, November 06, 2013 8:40 AM >> To: r-help at r-project.org >> Subject: [R] Basic question: why does a scatter plot of a variable against itself works like >> this? >> >> Hello all, >> >> I just noticed the following behavior of plot: >> x <- c(1,2,9) >> plot(x ~ x) # this is just like doing: >> plot(x) >> # when maybe we would like it to give this: >> plot(x ~ c(x)) >> # the same as: >> plot(x ~ I(x)) >> >> I was wondering if there is some reason for this behavior. >> >> >> Thanks, >> Tal >> >> >> >> ----------------Contact >> Details:------------------------------------------------------- >> Contact me: Tal.Galili at gmail.com | >> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | >> www.r-statistics.com (English) >> ---------------------------------------------------------------------------------------------- >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Apparently Analagous Threads
- Comparing the correlations coefficient of two (very) dependent samples
- rpart - how to estimate the “meaningful” predictors for an outcome (in classification trees)
- Scatter plot using icons (from a gif) instaed of points - is it possible ?
- Generate data - function
- median teat