Barry King
2015-May-09 14:38 UTC
[R] ERROR: length of 'center' must equal the number of columns of 'x'
I am attempting to predict tomorrow's rainfall, RISK_MM, with LASSO using a data set that I have partitioned into a train data set and a test data set. The structures of the two data sets are shown below and appear to be identical except the number of observations: str(train) 'data.frame': 262 obs. of 24 variables: $ Date : Factor w/ 366 levels "1/1/2008","1/10/2008",..: 146 312 160 345 58 69 202 52 236 176 ... $ Location : Factor w/ 1 level "Canberra": 1 1 1 1 1 1 1 1 1 1 ... $ MinTemp : num 17.1 4.6 11.3 0.7 10.3 10.1 3.8 7.1 0.5 4.2 ... $ MaxTemp : num 29.6 14.7 32.3 14.1 21.3 31.2 21.7 28.4 17.1 18.9 ... $ Rainfall : num 0 0 0 0 3 0 0.2 0 0 0 ... $ Evaporation : num 5.8 4.4 9.4 5.6 4.2 8.8 2.8 11.6 4 6.4 ... $ Sunshine : num 9.2 8.4 11.4 9 6.7 13.1 6.5 12.7 9.4 10.8 ... $ WindGustDir : Factor w/ 16 levels "E","ENE","ESE",..: 1 15 5 2 7 8 8 4 8 15 ... $ WindGustSpeed: int 48 52 28 20 43 41 44 48 31 50 ... $ WindDir9am : Factor w/ 16 levels "E","ENE","ESE",..: 10 15 2 12 2 9 3 7 3 16 ... $ WindDir3pm : Factor w/ 16 levels "E","ENE","ESE",..: 3 8 15 7 4 14 15 7 14 15 ... $ WindSpeed9am : int 9 28 4 6 7 6 2 2 6 6 ... $ WindSpeed3pm : int 17 33 6 7 19 20 20 19 13 31 ... $ Humidity9am : int 67 54 44 69 79 45 99 45 74 60 ... $ Humidity3pm : int 38 51 17 43 46 16 34 22 42 34 ... $ Pressure9am : num 1017 1015 1024 1027 1018 ... $ Pressure3pm : num 1013 1012 1021 1022 1014 ... $ Cloud9am : int 6 1 5 7 8 0 7 0 1 3 ... $ Cloud3pm : int 7 3 2 1 1 1 7 1 1 2 ... $ Temp9am : num 21.7 9.2 18.2 7.4 11.7 18.7 7.9 17.2 7.4 11.2 ... $ Temp3pm : num 29.1 12 30.5 13.7 19.8 30.4 20.2 28.2 16.2 18.1 ... $ RainToday : Factor w/ 2 levels "No","Yes": 1 1 1 1 2 1 1 1 1 1 ... $ RISK_MM : num 1.8 0 0 0 0 0 0 0 0 0 ... $ RainTomorrow : Factor w/ 2 levels "No","Yes": 2 1 1 1 1 1 1 1 1 1 ... - attr(*, "na.action")=Class 'omit' Named int [1:38] 114 119 128 139 141 175 177 181 190 194 ... .. ..- attr(*, "names")= chr [1:38] "114" "119" "128" "139" ... str(test) 'data.frame': 66 obs. of 24 variables: $ Date : Factor w/ 366 levels "1/1/2008","1/10/2008",..: 85 87 88 90 92 64 65 66 70 71 ... $ Location : Factor w/ 1 level "Canberra": 1 1 1 1 1 1 1 1 1 1 ... $ MinTemp : num 13.7 13.3 7.6 6.1 8.8 8.4 9.1 8.5 12.4 13.8 ... $ MaxTemp : num 23.4 15.5 16.1 18.2 19.5 22.8 25.2 27.3 32.1 31.2 ... $ Rainfall : num 3.6 39.8 2.8 0.2 0 16.2 0 0.2 0 0 ... $ Evaporation : num 5.8 7.2 5.6 4.2 4 5.4 4.2 7.2 8.4 7.2 ... $ Sunshine : num 3.3 9.1 10.6 8.4 4.1 7.7 11.9 12.5 11.1 8.4 ... $ WindGustDir : Factor w/ 16 levels "E","ENE","ESE",..: 8 8 11 10 9 1 4 1 1 3 ... $ WindGustSpeed: int 85 54 50 43 48 31 30 41 46 44 ... $ WindDir9am : Factor w/ 16 levels "E","ENE","ESE",..: 4 15 11 10 1 9 10 1 10 16 ... $ WindDir3pm : Factor w/ 16 levels "E","ENE","ESE",..: 6 14 3 3 2 3 8 8 16 14 ... $ WindSpeed9am : int 6 30 20 19 19 7 6 2 7 6 ... $ WindSpeed3pm : int 6 24 28 26 17 6 9 15 9 19 ... $ Humidity9am : int 82 62 68 63 70 82 74 54 70 72 ... $ Humidity3pm : int 69 56 49 47 48 32 34 35 22 23 ... $ Pressure9am : num 1010 1006 1018 1025 1026 ... $ Pressure3pm : num 1007 1007 1018 1022 1023 ... $ Cloud9am : int 8 2 7 4 7 7 1 0 0 7 ... $ Cloud3pm : int 7 7 7 6 7 1 2 3 3 6 ... $ Temp9am : num 15.4 13.5 11.1 12.4 14.1 13.3 14.6 16.8 19.1 20.2 ... $ Temp3pm : num 20.2 14.1 15.4 17.3 18.9 21.7 24 26 30.7 29.8 ... $ RainToday : Factor w/ 2 levels "No","Yes": 2 2 2 1 1 2 1 1 1 1 ... $ RISK_MM : num 39.8 2.8 0 0 16.2 0 0.2 0 0 1.2 ... $ RainTomorrow : Factor w/ 2 levels "No","Yes": 2 2 1 1 2 1 1 1 1 2 ... - attr(*, "na.action")=Class 'omit' Named int [1:38] 114 119 128 139 141 175 177 181 190 194 ... .. ..- attr(*, "names")= chr [1:38] "114" "119" "128" "139" ... x <- model.matrix(RISK_MM~MinTemp + MaxTemp + Rainfall + Evaporation + Sunshine + WindGustSpeed + WindGustDir + WindDir9am + WindDir3pm + WindSpeed9am + WindSpeed3pm + Humidity9am + Humidity3pm + Pressure9am + Pressure3pm + Cloud9am + Cloud3pm + Temp9am + Temp3pm + RainToday, data=train) x <- x[,-1] library(lars) lasso <- lars(x=x,y=train$RISK_MM,trace=TRUE,type="lasso") fits <- predict.lars(lasso, test, type="fit") This last statement generates the error: Error in scale.default(newx, object$meanx, FALSE) : length of 'center' must equal the number of columns of 'x' I do not know how to interpret this error message or how to resolve the error. Any guidance you can provide is appreciated. Thank you, Barry E. King Ph.D. Butler University College of Business Indianapolis, Indiana [[alternative HTML version deleted]]