I am trying to plot the residuals from a linear model and I get the following error message: Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' and 'y' lengths differ. The outcome is a continuous variable and the explanatory variable is ordinal. My immediate suspicion was that it had something to do with missing values. Each variable has missing values coded as NA. I can't figure it out. Please help. Thanks.> obama.mod = lm(ft_dpc_r~pid_x) > summary(obama.mod)Call: lm(formula = ft_dpc_r ~ pid_x) Residuals: Min 1Q Median 3Q Max -89.271 -14.783 0.729 10.729 84.193 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 101.5155 0.5797 175.12 <2e-16 *** pid_x -12.2441 0.1411 -86.76 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 22.85 on 5876 degrees of freedom (36 observations deleted due to missingness) Multiple R-squared: 0.5616, Adjusted R-squared: 0.5615 F-statistic: 7528 on 1 and 5876 DF, p-value: < 2.2e-16> plot(pid_x, resid(obama.mod), ylab = "Model Residuals")Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' and 'y' lengths differ Jason Gainous, Ph.D. Associate Professor Department of Political Science University of Louisville 203 Ford Hall Louisville, KY 40292 (502) 852-1660 Homepage: https://louisville.academia.edu/JasonGainous [[alternative HTML version deleted]]
Did you not notice: "Residual standard error: 22.85 on 5876 degrees of freedom (36 observations deleted due to missingness) ?? (No residuals for missings...) -- Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." H. Gilbert Welch On Sat, Mar 15, 2014 at 10:42 AM, Gainous,Jason <jason.gainous at louisville.edu> wrote:> I am trying to plot the residuals from a linear model and I get the following error message: Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' and 'y' lengths differ. The outcome is a continuous variable and the explanatory variable is ordinal. My immediate suspicion was that it had something to do with missing values. Each variable has missing values coded as NA. I can't figure it out. Please help. Thanks. > >> obama.mod = lm(ft_dpc_r~pid_x) >> summary(obama.mod) > > Call: > lm(formula = ft_dpc_r ~ pid_x) > > Residuals: > Min 1Q Median 3Q Max > -89.271 -14.783 0.729 10.729 84.193 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 101.5155 0.5797 175.12 <2e-16 *** > pid_x -12.2441 0.1411 -86.76 <2e-16 *** > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > Residual standard error: 22.85 on 5876 degrees of freedom > (36 observations deleted due to missingness) > Multiple R-squared: 0.5616, Adjusted R-squared: 0.5615 > F-statistic: 7528 on 1 and 5876 DF, p-value: < 2.2e-16 > >> plot(pid_x, resid(obama.mod), ylab = "Model Residuals") > Error in xy.coords(x, y, xlabel, ylabel, log) : > 'x' and 'y' lengths differ > > > Jason Gainous, Ph.D. > Associate Professor > Department of Political Science > University of Louisville > 203 Ford Hall > Louisville, KY 40292 > (502) 852-1660 > Homepage: https://louisville.academia.edu/JasonGainous > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Dear Jason, Your suspicion is correct: the cases with NAs in either ft_dpc_r or pid_x are absent from the residuals but not from pid_x. There are several ways to deal with this problem, but one straightforward way is to use na.exclude in place of the default na.omit in the call to lm, as in lm(ft_dpc_r~pid_x, na.action=na.exclude). Then residuals() will pad out the values it returns with NAs. I hope this helps, John ----------------------------------------------- John Fox, Professor McMaster University Hamilton, Ontario, Canada http://socserv.socsci.mcmaster.ca/jfox/> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Gainous,Jason > Sent: Saturday, March 15, 2014 1:42 PM > To: r-help at r-project.org > Subject: [R] plotting residuals/error message > > I am trying to plot the residuals from a linear model and I get the > following error message: Error in xy.coords(x, y, xlabel, ylabel, log) > : 'x' and 'y' lengths differ. The outcome is a continuous variable and > the explanatory variable is ordinal. My immediate suspicion was that it > had something to do with missing values. Each variable has missing > values coded as NA. I can't figure it out. Please help. Thanks. > > > obama.mod = lm(ft_dpc_r~pid_x) > > summary(obama.mod) > > Call: > lm(formula = ft_dpc_r ~ pid_x) > > Residuals: > Min 1Q Median 3Q Max > -89.271 -14.783 0.729 10.729 84.193 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 101.5155 0.5797 175.12 <2e-16 *** > pid_x -12.2441 0.1411 -86.76 <2e-16 *** > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > Residual standard error: 22.85 on 5876 degrees of freedom > (36 observations deleted due to missingness) > Multiple R-squared: 0.5616, Adjusted R-squared: 0.5615 > F-statistic: 7528 on 1 and 5876 DF, p-value: < 2.2e-16 > > > plot(pid_x, resid(obama.mod), ylab = "Model Residuals") > Error in xy.coords(x, y, xlabel, ylabel, log) : > 'x' and 'y' lengths differ > > > Jason Gainous, Ph.D. > Associate Professor > Department of Political Science > University of Louisville > 203 Ford Hall > Louisville, KY 40292 > (502) 852-1660 > Homepage: https://louisville.academia.edu/JasonGainous > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.