I did this:
nb <- naiveBayes(users, platform)
pl <- predict(nb,users)
nrow(users) ==> 314781
ncol(users) ==> 109
1. naiveBayes() was quite fast (~20 seconds), while predict() was slow
(tens of minutes). why?
2. the predict results were completely off the mark (quite the opposite
of the expected overfitting). suffice it to show the tables:
pl:
android blackberry ipad iphone lg linux mac
3 5 11 14 312723 5 11
mobile nokia samsung symbian unknown windows
1864 17 16 112 0 0
platform:
android blackberry ipad iphone lg linux mac
18013 1221 2647 1328 4 2936 34336
mobile nokia samsung symbian unknown windows
18 88 39 103 2660 251388
i.e., nb classified nearly everything as "lg" while in the actual data
"lg" is virtually nonexistent.
3. when I print "nb", I see "A-priori probabilities" (which
are what I
expected) and "Conditional probabilities" which are confusing because
there are only two of them, e.g.:
android 0.048464998 0.43946764
blackberry 0.001638002 0.04045564
ipad 0.322251606 1.84940588
iphone 0.030873494 0.23250250
lg 0.000000000 0.00000000
linux 0.023501362 0.34698919
mac 0.082653774 1.22535027
mobile 0.000000000 0.00000000
nokia 0.000000000 0.00000000
samsung 0.000000000 0.00000000
symbian 0.000000000 0.00000000
unknown 0.003759398 0.08219078
windows 0.021158528 0.32916970
the predictors are integers.
is the first column for the 0 predictors and the second for all non-0?
Is there a way to ask naiveBayes to differenciate between non-0 values?
thanks!
--
Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000
http://www.childpsy.net/ http://ffii.org http://www.PetitionOnline.com/tap12009/
http://mideasttruth.com http://iris.org.il http://openvotingconsortium.org
The program isn't debugged until the last user is dead.
When I tried to run svm on the same data frame, memory usage as reported by top(1) doubled to 4GB almost right away and the function never returned (has been running for ~15 hours now). ^C does not stop it. This is most unusual, libsvm has always seemed very fast. This is R version 2.13.1 (2011-07-08) (as distributed with ubuntu).> * Sam Steingold <fqf at tah.bet> [2012-02-09 21:43:30 -0500]: > > I did this: > nb <- naiveBayes(users, platform) > pl <- predict(nb,users) > nrow(users) ==> 314781 > ncol(users) ==> 109 > > 1. naiveBayes() was quite fast (~20 seconds), while predict() was slow > (tens of minutes). why? > > 2. the predict results were completely off the mark (quite the opposite > of the expected overfitting). suffice it to show the tables: > > pl: > > android blackberry ipad iphone lg linux mac > 3 5 11 14 312723 5 11 > mobile nokia samsung symbian unknown windows > 1864 17 16 112 0 0 > > platform: > android blackberry ipad iphone lg linux mac > 18013 1221 2647 1328 4 2936 34336 > mobile nokia samsung symbian unknown windows > 18 88 39 103 2660 251388 > > i.e., nb classified nearly everything as "lg" while in the actual data > "lg" is virtually nonexistent. > > 3. when I print "nb", I see "A-priori probabilities" (which are what I > expected) and "Conditional probabilities" which are confusing because > there are only two of them, e.g.: > > android 0.048464998 0.43946764 > blackberry 0.001638002 0.04045564 > ipad 0.322251606 1.84940588 > iphone 0.030873494 0.23250250 > lg 0.000000000 0.00000000 > linux 0.023501362 0.34698919 > mac 0.082653774 1.22535027 > mobile 0.000000000 0.00000000 > nokia 0.000000000 0.00000000 > samsung 0.000000000 0.00000000 > symbian 0.000000000 0.00000000 > unknown 0.003759398 0.08219078 > windows 0.021158528 0.32916970 > > the predictors are integers. > is the first column for the 0 predictors and the second for all non-0? > Is there a way to ask naiveBayes to differenciate between non-0 values? > > thanks!-- Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000 http://www.childpsy.net/ http://openvotingconsortium.org http://iris.org.il http://jihadwatch.org http://camera.org http://www.memritv.org Don't ascribe to malice what can be adequately explained by stupidity.
We don't have the data, but my guess is that you want to have some factors in your data that were integers when you tried the code below. Uwe Ligges On 10.02.2012 03:43, Sam Steingold wrote:> I did this: > nb<- naiveBayes(users, platform) > pl<- predict(nb,users) > nrow(users) ==> 314781 > ncol(users) ==> 109 > > 1. naiveBayes() was quite fast (~20 seconds), while predict() was slow > (tens of minutes). why? > > 2. the predict results were completely off the mark (quite the opposite > of the expected overfitting). suffice it to show the tables: > > pl: > > android blackberry ipad iphone lg linux mac > 3 5 11 14 312723 5 11 > mobile nokia samsung symbian unknown windows > 1864 17 16 112 0 0 > > platform: > android blackberry ipad iphone lg linux mac > 18013 1221 2647 1328 4 2936 34336 > mobile nokia samsung symbian unknown windows > 18 88 39 103 2660 251388 > > i.e., nb classified nearly everything as "lg" while in the actual data > "lg" is virtually nonexistent. > > 3. when I print "nb", I see "A-priori probabilities" (which are what I > expected) and "Conditional probabilities" which are confusing because > there are only two of them, e.g.: > > android 0.048464998 0.43946764 > blackberry 0.001638002 0.04045564 > ipad 0.322251606 1.84940588 > iphone 0.030873494 0.23250250 > lg 0.000000000 0.00000000 > linux 0.023501362 0.34698919 > mac 0.082653774 1.22535027 > mobile 0.000000000 0.00000000 > nokia 0.000000000 0.00000000 > samsung 0.000000000 0.00000000 > symbian 0.000000000 0.00000000 > unknown 0.003759398 0.08219078 > windows 0.021158528 0.32916970 > > the predictors are integers. > is the first column for the 0 predictors and the second for all non-0? > Is there a way to ask naiveBayes to differenciate between non-0 values? > > thanks! >