I think (2) might be a bad idea if one of the "sparse"categories has
high predictive power. You'll lose it when you pool, will you not?
Also, there is the problem of subjectively defining "sparse."
However, 1) seems quite sensible to me. But IANAE.
-- Bert
On Sun, Nov 20, 2022 at 9:49 AM Mitchell Maltenfort <mmalten at gmail.com>
wrote:>
> Two possible fixes occur to me
>
> 1) Redo the test/training split but within levels of factor - so you have
the same split within each level and each level accounted for in training and
testing
>
> 2) if you have a lot of levels, and perhaps sparse representation in a few,
consider recoding levels to pool the rare ones into an ?other? category
>
> On Sun, Nov 20, 2022 at 11:41 AM Bert Gunter <bgunter.4567 at
gmail.com> wrote:
>>
>> small reprex:
>>
>> set.seed(5)
>> dat <- data.frame(f = rep(c('r','g'),4), y =
runif(8))
>> newdat <- data.frame(f
=rep(c('r','g','b'),2))
>> ## convert values in newdat not seen in dat to NA
>> is.na(newdat$f) <-!( newdat$f %in% dat$f)
>> lmfit <- lm(y~f, data = dat)
>>
>> ##Result:
>> > predict(lmfit,newdat)
>> 1 2 3 4 5 6
>> 0.4374251 0.6196527 NA 0.4374251 0.6196527 NA
>>
>> If this does not suffice, as Rui said, we need details of what you did.
>> (predict.glm works like predict.lm)
>>
>>
>> -- Bert
>>
>>
>> On Sun, Nov 20, 2022 at 7:46 AM Rui Barradas <ruipbarradas at
sapo.pt> wrote:
>> >
>> > ?s 15:29 de 20/11/2022, G?bor Malomsoki escreveu:
>> > > Dear Bert,
>> > >
>> > > Yes, was trying to fill the not existing categories with NAs,
but the
>> > > suggested solutions in stackoverflow.com unfortunately did
not work.
>> > >
>> > > Best regards
>> > > Gabor
>> > >
>> > >
>> > > Bert Gunter <bgunter.4567 at gmail.com> schrieb am So.,
20. Nov. 2022, 16:20:
>> > >
>> > >> You can't predict results for categories that
you've not seen before
>> > >> (think about it). You will need to remove those cases
from your test set
>> > >> (or convert them to NA and predict them as NA).
>> > >>
>> > >> -- Bert
>> > >>
>> > >> On Sun, Nov 20, 2022 at 7:02 AM G?bor Malomsoki
<gmalomsoki1980 at gmail.com>
>> > >> wrote:
>> > >>
>> > >>> Dear all,
>> > >>>
>> > >>> i have created a logistic regression model,
>> > >>> on the train df:
>> > >>> mymodel1 <- glm(book_state ~ TG_KraftF5, data =
train, family >> > >>> "binomial")
>> > >>>
>> > >>> then i try to predict with the test df
>> > >>> Predict<- predict(mymodel1, newdata = test, type =
"response")
>> > >>> then iget this error message:
>> > >>> Error in model.frame.default(Terms, newdata,
na.action = na.action, xlev >> > >>> object$xlevels)
>> > >>> Factor "TG_KraftF5" has new levels
>> > >>>
>> > >>> i have tried different proposals from stackoverflow,
but unfortunately
>> > >>> they
>> > >>> did not solved the problem.
>> > >>> Do you have any idea how to test a logistic
regression model when you have
>> > >>> different levels in train and in test df?
>> > >>>
>> > >>> thank you in advance
>> > >>> Regards,
>> > >>> Gabor
>> > >>>
>> > >>> [[alternative HTML version deleted]]
>> > >>>
>> > >>> ______________________________________________
>> > >>> R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more, see
>> > >>> https://stat.ethz.ch/mailman/listinfo/r-help
>> > >>> PLEASE do read the posting guide
>> > >>> http://www.R-project.org/posting-guide.html
>> > >>> and provide commented, minimal, self-contained,
reproducible code.
>> > >>>
>> > >>
>> > >
>> > > [[alternative HTML version deleted]]
>> > >
>> > > ______________________________________________
>> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>> > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> > > and provide commented, minimal, self-contained, reproducible
code.
>> >
>> > hello,
>> >
>> > What exactly didn't work? You say you have tried the solutions
found in
>> > stackoverflow but without a link, we don't know which answers
to which
>> > questions you are talking about.
>> > Like Bert said, if you assign NA to the new levels, present only
in
>> > test, it should work.
>> >
>> > Can you post links to what you have tried?
>> >
>> > Hope this helps,
>> >
>> > Rui Barradas
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from Gmail Mobile