you say you asked elsewhere, but so many hits come up when I just search for
"unbalanced sample size" your justification for not following the
posting guide does not seem honest.
I also recall that various discussions of statistical power address this in
basic statistics.
On August 24, 2024 11:05:12 AM PDT, Christofer Bogaso <bogaso.christofer at
gmail.com> wrote:>Hi,
>
>I have asked this question elsewhere however failed to get any
>response, so hoping to get some insight from experts and statisticians
>here.
>
>Let say we are fitting a regression equation where one explanatory
>variable is categorical with 2 categories. However in the sample, one
>category has 95% of values but other category has just 5%. Means, the
>categories are highly unbalanced.
>
>Typically SE of estimate may be inflated for such highly unbalanced
>categorical explanatory variable.
>
>Such unbalanced case may come from 2 scenarios 1) there is a flaw in
>sample or it is just by chance that second category has just 5% values
>in the sample or 2) in the population itself, the second category has
>very small number of occurrences which is reflected in the sample.
>
>My question how the SE would be impacted in above 2 cases? Will the
>impact be same i.e. we would get incorrect estimate of SE in both
>cases? If yes, is there any way to prove analytically or may be based
>on simulation?
>
>My apologies as this question is not directly R related. However I
>just wanted to get some insight on above problem related to Statistics
>from some of the great Statisticians in this forum.
>
>Thanks for your time.
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
https://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
--
Sent from my phone. Please excuse my brevity.