Hi All, I'm a newbie and have two questions. Please pardon me if they are very basic. 1. I'm using a regression tree to predict the selling prices of 10 new records (homes). The following code is resulting in an error message: pred <- predict(model, newdata = outOfSample[, -6]) The error message is: Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = attr(object, : factor Sq. Feet has new levels 1375, 1421, 1547, 1621, 1868, 2211, 2265, 2530, 2672, 3365 Does anybody know what is causing this? I've pasted a snippet of my original dataset (Crankshaw) and my out-of-sample dataset below. Below it appears all code which I entered leading up to that point. The error message appears at the end of that code. 2. How can I get the regression tree to display in a more "friendly" way? Unfortunately I cannot paste a picture of it in this email, but it displays the values of individual records at each node instead of the decision rule logic (e.g., Age >= 28). I'm using the command > fancyRpartPlot(model) to display the tree. Thank you! Gary ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Original Data (Crankshaw): Sq. Feet Age Bedrm Bathrm Garage Sell Price ($) 1620 17 3 2 2 185500 1864 28 3 2 2 195250 1628 15 3 2 2 190750 1670 1 4 3 2 195750 1762 23 3 4 2 197250 1520 1 3 3 2 192900 Out-of-Sample Data: NEW RECORDS: Sq. Feet Age Bedrm Bathrm Garage Sell Price ($) 3365 8 4 4 3 1547 28 3 2 2 1375 36 2 1 1 1621 53 3 1 2 2530 23 4 3 2 1868 42 3 2 2 2211 23 3 2 2 1421 39 2 1 1 2672 3 4 2 3 2265 7 3 2 2 All Code Entered:> Crankshaw <- read_excel("C:/Data/Excel/Crankshaw.xlsx") > View(Crankshaw) > outOfSample <- Crankshaw[305:nrow(Crankshaw), ] > Crankshaw <- Crankshaw[1:300, ] > install.packages("caret")Installing package into ?C:/Users/Jason/Documents/R/win-library/3.4? (as ?lib? is unspecified) trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.4/caret_6.0-78.zip' Content type 'application/zip' length 5155836 bytes (4.9 MB) downloaded 4.9 MB package ?caret? successfully unpacked and MD5 sums checked The downloaded binary packages are in C:\Users\Jason\AppData\Local\Temp\RtmpmAxrJR\downloaded_packages> install.packages("rattle")Installing package into ?C:/Users/Jason/Documents/R/win-library/3.4? (as ?lib? is unspecified) trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.4/rattle_5.1.0.zip' Content type 'application/zip' length 1287407 bytes (1.2 MB) downloaded 1.2 MB package ?rattle? successfully unpacked and MD5 sums checked The downloaded binary packages are in C:\Users\Jason\AppData\Local\Temp\RtmpmAxrJR\downloaded_packages> library(rpart) > library(caret)Loading required package: lattice Loading required package: ggplot2 Warning messages: 1: package ?caret? was built under R version 3.4.3 2: package ?ggplot2? was built under R version 3.4.3> library(rattle) > n <- nrow(Crankshaw) > train <- sample(1:n, size = 0.5 * n, replace = FALSE) > CrankshawTrain <- Crankshaw[train, ] > temp <- (1:n)[-train] > val <- sample(temp, size = (0.3 / 0.5) * length(temp), replace = FALSE) > CrankshawVal <- Crankshaw[val, ] > test <- (1:n)[-c(train, val)] > CrankshawTest <- Crankshaw[test, ] > model <- rpart(`Selling Price ($)` ~ ., method = "anova", data = CrankshawTrain) > fancyRpartPlot(model) > pred <- predict(model, newdata = outOfSample[, -6])Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = attr(object, : factor Sq. Feet has new levels 1375, 1421, 1547, 1621, 1868, 2211, 2265, 2530, 2672, 3365 --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus
On Sat, Feb 24, 2018 at 01:16:27PM -0600, Gary Black wrote:> Hi All, > > I'm a newbie and have two questions. Please pardon me if they are very basic. > > > 1. I'm using a regression tree to predict the selling prices of 10 new records (homes). The following code is resulting in an error message: pred <- predict(model, newdata = outOfSample[, -6]) > > The error message is: > > Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = attr(object, : > factor Sq. Feet has new levels 1375, 1421, 1547, 1621, 1868, 2211, 2265, 2530, 2672, 3365 >Seems to me that variable 'Sq. Feet' is being encoded as a factor instead of having numerical values. When you train, the model sees a series of values that understands as categorical, and when you try to predict it is encountering some different categories and it doesn't know what to do with them. As that variable is most probably numeric, it should be read as such. You can try converting it on both your train and test datasets. Cheers, JMM. -- Jos? Mar?a Mateos https://rinzewind.org/blog-es || https://rinzewind.org/blog-en
But note that converting it e.g. via as.numeric() would be disastrous:> as.numeric(factor(c(3,5,7)))[1] 1 2 3 The OP may need to do some homework with R tutorials to learn about basic R data structures; or if he has already done this, he may need to be more explicit about how the data were created/entered. -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sat, Feb 24, 2018 at 11:21 AM, Jos? Mar?a Mateos <chema at rinzewind.org> wrote:> On Sat, Feb 24, 2018 at 01:16:27PM -0600, Gary Black wrote: > > Hi All, > > > > I'm a newbie and have two questions. Please pardon me if they are very > basic. > > > > > > 1. I'm using a regression tree to predict the selling prices of 10 new > records (homes). The following code is resulting in an error message: > pred <- predict(model, newdata = outOfSample[, -6]) > > > > The error message is: > > > > Error in model.frame.default(Terms, newdata, na.action = na.action, xlev > = attr(object, : > > factor Sq. Feet has new levels 1375, 1421, 1547, 1621, 1868, 2211, 2265, > 2530, 2672, 3365 > > > > Seems to me that variable 'Sq. Feet' is being encoded as a factor > instead of having numerical values. When you train, the model sees a > series of values that understands as categorical, and when you try to > predict it is encountering some different categories and it doesn't know > what to do with them. > > As that variable is most probably numeric, it should be read as such. > You can try converting it on both your train and test datasets. > > Cheers, > > JMM. > > -- Jos? Mar?a Mateos > https://rinzewind.org/blog-es || https://rinzewind.org/blog-en > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
As Bert implies, you may be getting ahead of yourself. An 8 may be a number, or it may be the character 8, or it could be a factor, and you don't seem to know the difference yet (thus suggesting tutorials). If you go to the trouble of making a reproducible example [1][2][3] then you may find the problem yourself or we will be able to check things using the example that you would not think to try. The str function can be helpful to find problems like the above. One surprisingly valuable step mentioned in the reprex references below is giving us the data for your example using the dput function. Another surprisingly useful technique is sending your question using plain text email format as the Posting Guide indicates (details of how to do that depends on your email client, which is off topic here). [1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example [2] http://adv-r.had.co.nz/Reproducibility.html [3] https://cran.r-project.org/web/packages/reprex/index.html (read the vignette) -- Sent from my phone. Please excuse my brevity. On February 24, 2018 11:16:27 AM PST, Gary Black <gwblack001 at sbcglobal.net> wrote:>Hi All, > >I'm a newbie and have two questions. Please pardon me if they are very >basic. > > >1. I'm using a regression tree to predict the selling prices of 10 new >records (homes). The following code is resulting in an error message: >pred <- predict(model, newdata = outOfSample[, -6]) > >The error message is: > >Error in model.frame.default(Terms, newdata, na.action = na.action, >xlev = attr(object, : >factor Sq. Feet has new levels 1375, 1421, 1547, 1621, 1868, 2211, >2265, 2530, 2672, 3365 > > >Does anybody know what is causing this? I've pasted a snippet of my >original dataset (Crankshaw) and my out-of-sample dataset below. Below >it appears all code which I entered leading up to that point. The >error message appears at the end of that code. > > >2. How can I get the regression tree to display in a more "friendly" >way? Unfortunately I cannot paste a picture of it in this email, but >it displays the values of individual records at each node instead of >the decision rule logic (e.g., Age >= 28). I'm using the command > >fancyRpartPlot(model) to display the tree. > > >Thank you! >Gary > >------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > >Original Data (Crankshaw): > >Sq. Feet Age Bedrm Bathrm Garage Sell Price ($) >1620 17 3 2 2 185500 >1864 28 3 2 2 195250 >1628 15 3 2 2 190750 >1670 1 4 3 2 195750 >1762 23 3 4 2 197250 >1520 1 3 3 2 192900 > > >Out-of-Sample Data: > >NEW RECORDS: >Sq. Feet Age Bedrm Bathrm Garage Sell Price ($) >3365 8 4 4 3 >1547 28 3 2 2 >1375 36 2 1 1 >1621 53 3 1 2 >2530 23 4 3 2 >1868 42 3 2 2 >2211 23 3 2 2 >1421 39 2 1 1 >2672 3 4 2 3 >2265 7 3 2 2 > > >All Code Entered: > >> Crankshaw <- read_excel("C:/Data/Excel/Crankshaw.xlsx") >> View(Crankshaw) >> outOfSample <- Crankshaw[305:nrow(Crankshaw), ] >> Crankshaw <- Crankshaw[1:300, ] >> install.packages("caret") >Installing package into ?C:/Users/Jason/Documents/R/win-library/3.4? >(as ?lib? is unspecified) >trying URL >'https://cran.rstudio.com/bin/windows/contrib/3.4/caret_6.0-78.zip' >Content type 'application/zip' length 5155836 bytes (4.9 MB) >downloaded 4.9 MB > >package ?caret? successfully unpacked and MD5 sums checked > >The downloaded binary packages are in > C:\Users\Jason\AppData\Local\Temp\RtmpmAxrJR\downloaded_packages >> install.packages("rattle") >Installing package into ?C:/Users/Jason/Documents/R/win-library/3.4? >(as ?lib? is unspecified) >trying URL >'https://cran.rstudio.com/bin/windows/contrib/3.4/rattle_5.1.0.zip' >Content type 'application/zip' length 1287407 bytes (1.2 MB) >downloaded 1.2 MB > >package ?rattle? successfully unpacked and MD5 sums checked > >The downloaded binary packages are in > C:\Users\Jason\AppData\Local\Temp\RtmpmAxrJR\downloaded_packages >> library(rpart) >> library(caret) >Loading required package: lattice >Loading required package: ggplot2 >Warning messages: >1: package ?caret? was built under R version 3.4.3 >2: package ?ggplot2? was built under R version 3.4.3 >> library(rattle) >> n <- nrow(Crankshaw) >> train <- sample(1:n, size = 0.5 * n, replace = FALSE) >> CrankshawTrain <- Crankshaw[train, ] >> temp <- (1:n)[-train] >> val <- sample(temp, size = (0.3 / 0.5) * length(temp), replace >FALSE) >> CrankshawVal <- Crankshaw[val, ] >> test <- (1:n)[-c(train, val)] >> CrankshawTest <- Crankshaw[test, ] >> model <- rpart(`Selling Price ($)` ~ ., method = "anova", data >CrankshawTrain) >> fancyRpartPlot(model) >> pred <- predict(model, newdata = outOfSample[, -6]) >Error in model.frame.default(Terms, newdata, na.action = na.action, >xlev = attr(object, : >factor Sq. Feet has new levels 1375, 1421, 1547, 1621, 1868, 2211, >2265, 2530, 2672, 3365 > > >--- >This email has been checked for viruses by Avast antivirus software. >https://www.avast.com/antivirus > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.