JC Matthews
2011-Aug-22 15:37 UTC
[R] Multiple regression in R - unstandardised coefficients are a different sign to standardised coefficients, is this correct?
Hello, I have a statistical problem that I am using R for, but I am not making sense of the results. I am trying to use multiple regression to explore which variables (weather conditions) have the greater effect on a local atmospheric variable. The data is taken from a database that has 20391 data points (Z1). A simplified version of the data I'm looking at is given below, but I have a problem in that there is a disagreement in sign between the regression coefficients and the standardised regression coefficients. Intuitively I would expect both to be the same sign, but in many of the parameters, they are not. I am aware that there is a strong opinion that using standardised correlation coefficients is highly discouraged by some people, but I would nevertheless like to see the results. Not least because it has made me doubt the non-standardised values of B that R has given me. The code I have used, and some of the data, is as follows (once the database has been imported from SQL, and outliers removed). Z1sub <- Z1[, c(2, 5, 7,11, 12, 13, 15, 16)] colnames(Z1sub) <- c("temp", "hum", "wind", "press", "rain", "s.rad", "mean1", "sd1" ) attach(Z1sub) names(Z1sub) Model1d <- lm(mean1 ~ hum*wind*rain + I(hum^2) + I(wind^2) + I(rain^2) ) summary(Model1d) Call: lm(formula = mean1 ~ hum * wind * rain + I(hum^2) + I(wind^2) + I(rain^2)) Residuals: Min 1Q Median 3Q Max -1230.64 -63.17 18.51 97.85 1275.73 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -9.243e+02 5.689e+01 -16.246 < 2e-16 *** hum 2.835e+01 1.468e+00 19.312 < 2e-16 *** wind 1.236e+02 4.832e+00 25.587 < 2e-16 *** rain -3.144e+03 7.635e+02 -4.118 3.84e-05 *** I(hum^2) -1.953e-01 9.393e-03 -20.793 < 2e-16 *** I(wind^2) 6.914e-01 2.174e-01 3.181 0.00147 ** I(rain^2) 2.730e+02 3.265e+01 8.362 < 2e-16 *** hum:wind -1.782e+00 5.448e-02 -32.706 < 2e-16 *** hum:rain 2.798e+01 8.410e+00 3.327 0.00088 *** wind:rain 6.018e+02 2.146e+02 2.805 0.00504 ** hum:wind:rain -6.606e+00 2.401e+00 -2.751 0.00594 ** --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 180.5 on 20337 degrees of freedom Multiple R-squared: 0.2394, Adjusted R-squared: 0.239 F-statistic: 640.2 on 10 and 20337 DF, p-value: < 2.2e-16 To calculate the standardised coefficients, I used the following: Z1sub.scaled <- data.frame(scale( Z1sub[,c('temp', 'hum', 'wind', 'press', 'rain', 's.rad', 'mean1', 'sd1' ) ] ) ) attach(Z1sub.scaled) names(Z1sub.scaled) Model1d.sc <- lm(mean1 ~ hum*wind*rain + I(hum^2) + I(wind^2) + I(rain^2) ) summary(Model1d.scaled) Call: lm(formula = mean1 ~ hum * wind * rain + I(hum^2) + I(wind^2) + I(rain^2)) Residuals: Min 1Q Median 3Q Max -5.94713 -0.30527 0.08946 0.47287 6.16503 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.0806858 0.0096614 8.351 < 2e-16 *** hum -0.4581509 0.0073456 -62.371 < 2e-16 *** wind -0.1995316 0.0073767 -27.049 < 2e-16 *** rain -0.1806894 0.0158037 -11.433 < 2e-16 *** I(hum^2) -0.1120435 0.0053885 -20.793 < 2e-16 *** I(wind^2) 0.0172870 0.0054346 3.181 0.00147 ** I(rain^2) 0.0040575 0.0004853 8.362 < 2e-16 *** hum:wind -0.2188729 0.0066659 -32.835 < 2e-16 *** hum:rain 0.0267420 0.0146201 1.829 0.06740 . wind:rain 0.0365615 0.0122335 2.989 0.00281 ** hum:wind:rain -0.0438790 0.0159479 -2.751 0.00594 ** --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 0.8723 on 20337 degrees of freedom Multiple R-squared: 0.2394, Adjusted R-squared: 0.239 F-statistic: 640.2 on 10 and 20337 DF, p-value: < 2.2e-16 So having, for instance for humidity (hum), B = 28.35 +/- 1.468, while Beta = -0.4581509 +/- 0.0073456 is concerning. Is this normal, or is there an error in my code that has caused this contradiction? Many thanks, James. ---------------------- JC Matthews School of Chemistry Bristol University
Ista Zahn
2011-Aug-22 16:02 UTC
[R] Multiple regression in R - unstandardised coefficients are a different sign to standardised coefficients, is this correct?
Hi JC, You have interactions in your model, which means that your models specifies that the coefficients for hum, wind, and rain should vary depending on the value of the other two (and depending on their own value actually, since you also have quadratic effects for each of these variables in your model). Since these coefficients are varying according to the model, it is impossible to specify their value unconditionally. The values you are seeing are therefore conditional estimates that at particular values on the variables with which each predictor interacts. Since you've changed the distribution of those variables by standardizing them, you get different conditional estimates. All this will be covered in most regression textbooks. Best, Ista On Mon, Aug 22, 2011 at 11:37 AM, JC Matthews <J.C.Matthews at bristol.ac.uk> wrote:> > Hello, > > I have a statistical problem that I am using R for, but I am not making > sense of the results. I am trying to use multiple regression to explore > which variables (weather conditions) have the greater effect on a local > atmospheric variable. The data is taken from a database that has 20391 data > points (Z1). > > A simplified version of the data I'm looking at is given below, but I have a > problem in that there is a disagreement in sign between the regression > coefficients and the standardised regression coefficients. Intuitively I > would expect both to be the same sign, but in many of the parameters, they > are not. > > I am aware that there is a strong opinion that using standardised > correlation coefficients is highly discouraged by some people, but I would > nevertheless like to see the results. Not least because it has made me doubt > the non-standardised values of B that R has given me. > > The code I have used, and some of the data, is as follows (once the database > has been imported from SQL, and outliers removed). > > > > Z1sub ?<- Z1[, c(2, 5, 7,11, 12, 13, 15, 16)] > colnames(Z1sub) <- c("temp", "hum", "wind", "press", "rain", "s.rad", > "mean1", "sd1" ) > > attach(Z1sub) > names(Z1sub) > > > Model1d <- lm(mean1 ~ hum*wind*rain + ?I(hum^2) + I(wind^2) + I(rain^2) ) > > summary(Model1d) > > Call: > lm(formula = mean1 ~ hum * wind * rain + I(hum^2) + I(wind^2) + > ? I(rain^2)) > > Residuals: > ? ?Min ? ? ? 1Q ? Median ? ? ? 3Q ? ? ?Max > -1230.64 ? -63.17 ? ?18.51 ? ?97.85 ?1275.73 > > Coefficients: > ? ? ? ? ? ? ? Estimate Std. Error t value Pr(>|t|) > (Intercept) ? -9.243e+02 ?5.689e+01 -16.246 ?< 2e-16 *** > hum ? ? ? ? ? ?2.835e+01 ?1.468e+00 ?19.312 ?< 2e-16 *** > wind ? ? ? ? ? 1.236e+02 ?4.832e+00 ?25.587 ?< 2e-16 *** > rain ? ? ? ? ?-3.144e+03 ?7.635e+02 ?-4.118 3.84e-05 *** > I(hum^2) ? ? ?-1.953e-01 ?9.393e-03 -20.793 ?< 2e-16 *** > I(wind^2) ? ? ?6.914e-01 ?2.174e-01 ? 3.181 ?0.00147 ** > I(rain^2) ? ? ?2.730e+02 ?3.265e+01 ? 8.362 ?< 2e-16 *** > hum:wind ? ? ?-1.782e+00 ?5.448e-02 -32.706 ?< 2e-16 *** > hum:rain ? ? ? 2.798e+01 ?8.410e+00 ? 3.327 ?0.00088 *** > wind:rain ? ? ?6.018e+02 ?2.146e+02 ? 2.805 ?0.00504 ** > hum:wind:rain -6.606e+00 ?2.401e+00 ?-2.751 ?0.00594 ** > --- > Signif. codes: ?0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > Residual standard error: 180.5 on 20337 degrees of freedom > Multiple R-squared: 0.2394, ? ? Adjusted R-squared: 0.239 > F-statistic: 640.2 on 10 and 20337 DF, ?p-value: < 2.2e-16 > > > > > > To calculate the standardised coefficients, I used the following: > > Z1sub.scaled <- data.frame(scale( Z1sub[,c('temp', 'hum', 'wind', 'press', > 'rain', 's.rad', 'mean1', 'sd1' ) ] ) ) > > attach(Z1sub.scaled) > names(Z1sub.scaled) > > > Model1d.sc <- lm(mean1 ~ hum*wind*rain + ?I(hum^2) + I(wind^2) + I(rain^2) ) > > summary(Model1d.scaled) > > Call: > lm(formula = mean1 ~ hum * wind * rain + I(hum^2) + I(wind^2) + > ? I(rain^2)) > > Residuals: > ? ?Min ? ? ? 1Q ? Median ? ? ? 3Q ? ? ?Max > -5.94713 -0.30527 ?0.08946 ?0.47287 ?6.16503 > > Coefficients: > ? ? ? ? ? ? ? Estimate Std. Error t value Pr(>|t|) > (Intercept) ? ?0.0806858 ?0.0096614 ? 8.351 ?< 2e-16 *** > hum ? ? ? ? ? -0.4581509 ?0.0073456 -62.371 ?< 2e-16 *** > wind ? ? ? ? ?-0.1995316 ?0.0073767 -27.049 ?< 2e-16 *** > rain ? ? ? ? ?-0.1806894 ?0.0158037 -11.433 ?< 2e-16 *** > I(hum^2) ? ? ?-0.1120435 ?0.0053885 -20.793 ?< 2e-16 *** > I(wind^2) ? ? ?0.0172870 ?0.0054346 ? 3.181 ?0.00147 ** > I(rain^2) ? ? ?0.0040575 ?0.0004853 ? 8.362 ?< 2e-16 *** > hum:wind ? ? ?-0.2188729 ?0.0066659 -32.835 ?< 2e-16 *** > hum:rain ? ? ? 0.0267420 ?0.0146201 ? 1.829 ?0.06740 . > wind:rain ? ? ?0.0365615 ?0.0122335 ? 2.989 ?0.00281 ** > hum:wind:rain -0.0438790 ?0.0159479 ?-2.751 ?0.00594 ** > --- > Signif. codes: ?0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > Residual standard error: 0.8723 on 20337 degrees of freedom > Multiple R-squared: 0.2394, ? ? Adjusted R-squared: 0.239 > F-statistic: 640.2 on 10 and 20337 DF, ?p-value: < 2.2e-16 > > > > So having, for instance for humidity (hum), B = 28.35 +/- ?1.468, while Beta > = -0.4581509 +/- 0.0073456 is concerning. Is this normal, or is there an > error in my code that has caused this contradiction? > > Many thanks, > > James. > > > ---------------------- > JC Matthews > School of Chemistry > Bristol University > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org
(Ted Harding)
2011-Aug-22 16:30 UTC
[R] Multiple regression in R - unstandardised coefficients a
On 22-Aug-11 15:37:40, JC Matthews wrote:> Hello, > > I have a statistical problem that I am using R for, but I am > not making sense of the results. I am trying to use multiple > regression to explore which variables (weather conditions) > have the greater effect on a local atmospheric variable. > The data is taken from a database that has 20391 data points (Z1). > > A simplified version of the data I'm looking at is given below, > but I have a problem in that there is a disagreement in sign > between the regression coefficients and the standardised regression > coefficients. Intuitively I would expect both to be the same sign, > but in many of the parameters, they are not. > > I am aware that there is a strong opinion that using standardised > correlation coefficients is highly discouraged by some people, > but I would nevertheless like to see the results. Not least > because it has made me doubt the non-standardised values of B > that R has given me. > > The code I have used, and some of the data, is as follows (once > the database has been imported from SQL, and outliers removed). > > Z1sub <- Z1[, c(2, 5, 7,11, 12, 13, 15, 16)] > colnames(Z1sub) <- c("temp", "hum", "wind", "press", "rain", "s.rad", > "mean1", "sd1" ) > > attach(Z1sub) > names(Z1sub) > > > Model1d <- lm(mean1 ~ hum*wind*rain + I(hum^2) + I(wind^2) + I(rain^2) > ) > > summary(Model1d) > > Call: > lm(formula = mean1 ~ hum * wind * rain + I(hum^2) + I(wind^2) + > I(rain^2)) > > Residuals: > Min 1Q Median 3Q Max > -1230.64 -63.17 18.51 97.85 1275.73 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) -9.243e+02 5.689e+01 -16.246 < 2e-16 *** > hum 2.835e+01 1.468e+00 19.312 < 2e-16 *** > wind 1.236e+02 4.832e+00 25.587 < 2e-16 *** > rain -3.144e+03 7.635e+02 -4.118 3.84e-05 *** > I(hum^2) -1.953e-01 9.393e-03 -20.793 < 2e-16 *** > I(wind^2) 6.914e-01 2.174e-01 3.181 0.00147 ** > I(rain^2) 2.730e+02 3.265e+01 8.362 < 2e-16 *** > hum:wind -1.782e+00 5.448e-02 -32.706 < 2e-16 *** > hum:rain 2.798e+01 8.410e+00 3.327 0.00088 *** > wind:rain 6.018e+02 2.146e+02 2.805 0.00504 ** > hum:wind:rain -6.606e+00 2.401e+00 -2.751 0.00594 ** > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > Residual standard error: 180.5 on 20337 degrees of freedom > Multiple R-squared: 0.2394, Adjusted R-squared: 0.239 > F-statistic: 640.2 on 10 and 20337 DF, p-value: < 2.2e-16 > > > > > > To calculate the standardised coefficients, I used the following: > > Z1sub.scaled <- data.frame(scale( Z1sub[,c('temp', 'hum', 'wind', > 'press', > 'rain', 's.rad', 'mean1', 'sd1' ) ] ) ) > > attach(Z1sub.scaled) > names(Z1sub.scaled) > > > Model1d.sc <- lm(mean1 ~ hum*wind*rain + I(hum^2) + I(wind^2) + > I(rain^2) ) > > summary(Model1d.scaled) > > Call: > lm(formula = mean1 ~ hum * wind * rain + I(hum^2) + I(wind^2) + > I(rain^2)) > > Residuals: > Min 1Q Median 3Q Max > -5.94713 -0.30527 0.08946 0.47287 6.16503 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 0.0806858 0.0096614 8.351 < 2e-16 *** > hum -0.4581509 0.0073456 -62.371 < 2e-16 *** > wind -0.1995316 0.0073767 -27.049 < 2e-16 *** > rain -0.1806894 0.0158037 -11.433 < 2e-16 *** > I(hum^2) -0.1120435 0.0053885 -20.793 < 2e-16 *** > I(wind^2) 0.0172870 0.0054346 3.181 0.00147 ** > I(rain^2) 0.0040575 0.0004853 8.362 < 2e-16 *** > hum:wind -0.2188729 0.0066659 -32.835 < 2e-16 *** > hum:rain 0.0267420 0.0146201 1.829 0.06740 . > wind:rain 0.0365615 0.0122335 2.989 0.00281 ** > hum:wind:rain -0.0438790 0.0159479 -2.751 0.00594 ** > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > Residual standard error: 0.8723 on 20337 degrees of freedom > Multiple R-squared: 0.2394, Adjusted R-squared: 0.239 > F-statistic: 640.2 on 10 and 20337 DF, p-value: < 2.2e-16 > > > > So having, for instance for humidity (hum), B = 28.35 +/- 1.468, while > Beta = -0.4581509 +/- 0.0073456 is concerning. Is this normal, or is > there > an error in my code that has caused this contradiction? > > Many thanks, > > James. > ---------------------- > JC Matthews > School of Chemistry > Bristol UniversityHi, without having your data, so unable to check, I would not be surprised if the changes of sign were the outcome of your model formula, in particular the 3-variable (2nd-order) interaction, i.e. you are using a model which is non-linear in the variables themselves. Let's just take that part of the model: lm(formula = mean1 ~ hum * wind * rain This, in its quantitative expression, expands to: mean1 = C0 + C11*hum + C12*wind + C13*rain + C21*hum*wind + C22*hum*rain + C23*wind*rain + C31*hum*wind*rain Suppose that is for the unstandardised variables. Now express it in terms of standardised variables (initial capital letters): mean1 = C0 + C11*sd(hum)*(Hum + mean(hum)/sd(hum)) + C12*sd(wind)*(Wind + mean(wind)/sd(wind)) + C13*sd(rain)*(Rain + mean(rain)/sd(rain)) + C21*sd(hum)*sd(wind)* (Hum + mean(hum)/sd(hum))*(Wind + mean(wind)/sd(wind)) + C22*sd(hum)*sd(rain)* (Hum + mean(hum)/sd(hum))*(Rain + mean(rain)/sd(rain)) + C23*sd(wind)*sd(rain)* (Wind + mean(wind)/sd(wind))* (Rain + mean(rain)/sd(rain)) + C31*sd(hum)*sd(wind)*sd(rain)* (Hum + mean(hum)/sd(hum))* (Wind + mean(wind)/sd(wind))* (Rain + mean(rain)/sd(rain)) Now pick out, say, the coefficient of 'Hum' in this latter expression (i.e. all the terms which involve 'Hum' but neither 'Wind' nor 'Rain'): C11*sd(hum) + C21*sd(hum)*sd(wind)*mean(wind)/sd(wind) + C22*sd(hum)*sd(rain)*mean(rain)/sd(rain) + C31*sd(hum)*sd(wind)*sd(rain)* (mean(wind)/sd(wind))*(mean(rain)/sd(rain)) = C11*sd(hum) + C21*sd(hum)*mean(wind) + C22*sd(hum)*mean(rain) + C31*sd(hum)*mean(wind)*mean(rain) So there is no reason to expect this to have even the same sign as the original C11, the coefficient of 'hum', let alone any more specific relationship with it! Hoping this helps, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.harding at wlandres.net> Fax-to-email: +44 (0)870 094 0861 Date: 22-Aug-11 Time: 17:30:29 ------------------------------ XFMail ------------------------------
Ista Zahn
2011-Aug-23 13:54 UTC
[R] Multiple regression in R - unstandardised coefficients a
On Tue, Aug 23, 2011 at 7:54 AM, JC Matthews <J.C.Matthews at bristol.ac.uk> wrote:> Thankyou for your replies, you've answered my question and given me more to > think on. ?I guess it is unwise to draw any conclusions from the > standardised results for these reasons.No, by all means try to draw conclusions! Isn't that the point of the analysis in the first place? All I am (we are?) saying is that you need to do your homework and learn how to draw _appropriate_ conclusions from the analysis. Best, Ista> > James. > > --On 22 August 2011 17:30 +0100 ted.harding at wlandres.net wrote: > >> On 22-Aug-11 15:37:40, JC Matthews wrote: >>> >>> Hello, >>> >>> I have a statistical problem that I am using R for, but I am >>> not making sense of the results. I am trying to use multiple >>> regression to explore which variables (weather conditions) >>> have the greater effect on a local atmospheric variable. >>> The data is taken from a database that has 20391 data points (Z1). >>> >>> A simplified version of the data I'm looking at is given below, >>> but I have a problem in that there is a disagreement in sign >>> between the regression coefficients and the standardised regression >>> coefficients. Intuitively I would expect both to be the same sign, >>> but in many of the parameters, they are not. >>> >>> I am aware that there is a strong opinion that using standardised >>> correlation coefficients is highly discouraged by some people, >>> but I would nevertheless like to see the results. Not least >>> because it has made me doubt the non-standardised values of B >>> that R has given me. >>> >>> The code I have used, and some of the data, is as follows (once >>> the database has been imported from SQL, and outliers removed). >>> >>> Z1sub ?<- Z1[, c(2, 5, 7,11, 12, 13, 15, 16)] >>> colnames(Z1sub) <- c("temp", "hum", "wind", "press", "rain", "s.rad", >>> "mean1", "sd1" ) >>> >>> attach(Z1sub) >>> names(Z1sub) >>> >>> >>> Model1d <- lm(mean1 ~ hum*wind*rain + ?I(hum^2) + I(wind^2) + I(rain^2) >>> ) >>> >>> summary(Model1d) >>> >>> Call: >>> lm(formula = mean1 ~ hum * wind * rain + I(hum^2) + I(wind^2) + >>> ? ?I(rain^2)) >>> >>> Residuals: >>> ? ? Min ? ? ? 1Q ? Median ? ? ? 3Q ? ? ?Max >>> -1230.64 ? -63.17 ? ?18.51 ? ?97.85 ?1275.73 >>> >>> Coefficients: >>> ? ? ? ? ? ? ? ?Estimate Std. Error t value Pr(>|t|) >>> (Intercept) ? -9.243e+02 ?5.689e+01 -16.246 ?< 2e-16 *** >>> hum ? ? ? ? ? ?2.835e+01 ?1.468e+00 ?19.312 ?< 2e-16 *** >>> wind ? ? ? ? ? 1.236e+02 ?4.832e+00 ?25.587 ?< 2e-16 *** >>> rain ? ? ? ? ?-3.144e+03 ?7.635e+02 ?-4.118 3.84e-05 *** >>> I(hum^2) ? ? ?-1.953e-01 ?9.393e-03 -20.793 ?< 2e-16 *** >>> I(wind^2) ? ? ?6.914e-01 ?2.174e-01 ? 3.181 ?0.00147 ** >>> I(rain^2) ? ? ?2.730e+02 ?3.265e+01 ? 8.362 ?< 2e-16 *** >>> hum:wind ? ? ?-1.782e+00 ?5.448e-02 -32.706 ?< 2e-16 *** >>> hum:rain ? ? ? 2.798e+01 ?8.410e+00 ? 3.327 ?0.00088 *** >>> wind:rain ? ? ?6.018e+02 ?2.146e+02 ? 2.805 ?0.00504 ** >>> hum:wind:rain -6.606e+00 ?2.401e+00 ?-2.751 ?0.00594 ** >>> --- >>> Signif. codes: ?0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 >>> ' ' 1 >>> >>> Residual standard error: 180.5 on 20337 degrees of freedom >>> Multiple R-squared: 0.2394, ? ? Adjusted R-squared: 0.239 >>> F-statistic: 640.2 on 10 and 20337 DF, ?p-value: < 2.2e-16 >>> >>> >>> >>> >>> >>> To calculate the standardised coefficients, I used the following: >>> >>> Z1sub.scaled <- data.frame(scale( Z1sub[,c('temp', 'hum', 'wind', >>> 'press', >>> 'rain', 's.rad', 'mean1', 'sd1' ) ] ) ) >>> >>> attach(Z1sub.scaled) >>> names(Z1sub.scaled) >>> >>> >>> Model1d.sc <- lm(mean1 ~ hum*wind*rain + ?I(hum^2) + I(wind^2) + >>> I(rain^2) ) >>> >>> summary(Model1d.scaled) >>> >>> Call: >>> lm(formula = mean1 ~ hum * wind * rain + I(hum^2) + I(wind^2) + >>> ? ?I(rain^2)) >>> >>> Residuals: >>> ? ? Min ? ? ? 1Q ? Median ? ? ? 3Q ? ? ?Max >>> -5.94713 -0.30527 ?0.08946 ?0.47287 ?6.16503 >>> >>> Coefficients: >>> ? ? ? ? ? ? ? ?Estimate Std. Error t value Pr(>|t|) >>> (Intercept) ? ?0.0806858 ?0.0096614 ? 8.351 ?< 2e-16 *** >>> hum ? ? ? ? ? -0.4581509 ?0.0073456 -62.371 ?< 2e-16 *** >>> wind ? ? ? ? ?-0.1995316 ?0.0073767 -27.049 ?< 2e-16 *** >>> rain ? ? ? ? ?-0.1806894 ?0.0158037 -11.433 ?< 2e-16 *** >>> I(hum^2) ? ? ?-0.1120435 ?0.0053885 -20.793 ?< 2e-16 *** >>> I(wind^2) ? ? ?0.0172870 ?0.0054346 ? 3.181 ?0.00147 ** >>> I(rain^2) ? ? ?0.0040575 ?0.0004853 ? 8.362 ?< 2e-16 *** >>> hum:wind ? ? ?-0.2188729 ?0.0066659 -32.835 ?< 2e-16 *** >>> hum:rain ? ? ? 0.0267420 ?0.0146201 ? 1.829 ?0.06740 . >>> wind:rain ? ? ?0.0365615 ?0.0122335 ? 2.989 ?0.00281 ** >>> hum:wind:rain -0.0438790 ?0.0159479 ?-2.751 ?0.00594 ** >>> --- >>> Signif. codes: ?0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 >>> ' ' 1 >>> >>> Residual standard error: 0.8723 on 20337 degrees of freedom >>> Multiple R-squared: 0.2394, ? ? Adjusted R-squared: 0.239 >>> F-statistic: 640.2 on 10 and 20337 DF, ?p-value: < 2.2e-16 >>> >>> >>> >>> So having, for instance for humidity (hum), B = 28.35 +/- ?1.468, while >>> Beta = -0.4581509 +/- 0.0073456 is concerning. Is this normal, or is >>> there >>> an error in my code that has caused this contradiction? >>> >>> Many thanks, >>> >>> James. >>> ---------------------- >>> JC Matthews >>> School of Chemistry >>> Bristol University >> >> Hi, >> without having your data, so unable to check, I would not be >> surprised if the changes of sign were the outcome of your model >> formula, in particular the 3-variable (2nd-order) interaction, >> i.e. you are using a model which is non-linear in the variables >> themselves. Let's just take that part of the model: >> >> ?lm(formula = mean1 ~ hum * wind * rain >> >> This, in its quantitative expression, expands to: >> >> ?mean1 = C0 + C11*hum + C12*wind + C13*rain >> ? ? ? ? ? ? + C21*hum*wind + C22*hum*rain + C23*wind*rain >> ? ? ? ? ? ? + C31*hum*wind*rain >> >> Suppose that is for the unstandardised variables. Now express >> it in terms of standardised variables (initial capital letters): >> >> ?mean1 = C0 + C11*sd(hum)*(Hum + mean(hum)/sd(hum)) >> ? ? ? ? ? ? + C12*sd(wind)*(Wind + mean(wind)/sd(wind)) >> ? ? ? ? ? ? + C13*sd(rain)*(Rain + mean(rain)/sd(rain)) >> >> ? ? ? ? ? ? + C21*sd(hum)*sd(wind)* >> ? ? ? ? ? ? ? ? ? (Hum + mean(hum)/sd(hum))*(Wind + mean(wind)/sd(wind)) >> >> ? ? ? ? ? ? + C22*sd(hum)*sd(rain)* >> ? ? ? ? ? ? ? ? ? (Hum + mean(hum)/sd(hum))*(Rain + mean(rain)/sd(rain)) >> >> ? ? ? ? ? ? + C23*sd(wind)*sd(rain)* >> ? ? ? ? ? ? ? ? ? (Wind + mean(wind)/sd(wind))* >> ? ? ? ? ? ? ? ? ? (Rain + mean(rain)/sd(rain)) >> >> ? ? ? ? ? ? + C31*sd(hum)*sd(wind)*sd(rain)* >> ? ? ? ? ? ? ? ? (Hum + mean(hum)/sd(hum))* >> ? ? ? ? ? ? ? ? (Wind + mean(wind)/sd(wind))* >> ? ? ? ? ? ? ? ? (Rain + mean(rain)/sd(rain)) >> >> Now pick out, say, the coefficient of 'Hum' in this latter expression >> (i.e. all the terms which involve 'Hum' but neither 'Wind' nor 'Rain'): >> >> ?C11*sd(hum) >> + C21*sd(hum)*sd(wind)*mean(wind)/sd(wind) >> + C22*sd(hum)*sd(rain)*mean(rain)/sd(rain) >> + C31*sd(hum)*sd(wind)*sd(rain)* >> ? ? ?(mean(wind)/sd(wind))*(mean(rain)/sd(rain)) >> >> = C11*sd(hum) >> + C21*sd(hum)*mean(wind) >> + C22*sd(hum)*mean(rain) >> + C31*sd(hum)*mean(wind)*mean(rain) >> >> So there is no reason to expect this to have even the same sign >> as the original C11, the coefficient of 'hum', let alone any more >> specific relationship with it! >> >> Hoping this helps, >> Ted. >> >> >> >> -------------------------------------------------------------------- >> E-Mail: (Ted Harding) <ted.harding at wlandres.net> >> Fax-to-email: +44 (0)870 094 0861 >> Date: 22-Aug-11 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Time: 17:30:29 >> ------------------------------ XFMail ------------------------------ > > > > ---------------------- > JC Matthews > Atmospheric Chemistry Research Group > School of Chemistry > Bristol University > J.C.Matthews at bristol.ac.uk >-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org