thr3ads.net - R help - [R] naiveBayes: slow predict, weird results [Feb 2012]

If this information is useful, please help other people find it:
Share via:

Sam Steingold

2012-Feb-10 02:43 UTC

[R] naiveBayes: slow predict, weird results

I did this:
nb <- naiveBayes(users, platform)
pl <- predict(nb,users)
nrow(users) ==> 314781
ncol(users) ==> 109

1. naiveBayes() was quite fast (~20 seconds), while predict() was slow
(tens of minutes).  why?

2. the predict results were completely off the mark (quite the opposite
of the expected overfitting).  suffice it to show the tables:

pl:

   android blackberry       ipad     iphone         lg      linux        mac 
         3          5         11         14     312723          5         11 
    mobile      nokia    samsung    symbian    unknown    windows 
      1864         17         16        112          0          0 

platform:
   android blackberry       ipad     iphone         lg      linux        mac 
     18013       1221       2647       1328          4       2936      34336 
    mobile      nokia    samsung    symbian    unknown    windows 
        18         88         39        103       2660     251388 

i.e., nb classified nearly everything as "lg" while in the actual data
"lg" is virtually nonexistent.

3. when I print "nb", I see "A-priori probabilities" (which
are what I
expected) and "Conditional probabilities" which are confusing because
there are only two of them, e.g.:

             android    0.048464998 0.43946764
             blackberry 0.001638002 0.04045564
             ipad       0.322251606 1.84940588
             iphone     0.030873494 0.23250250
             lg         0.000000000 0.00000000
             linux      0.023501362 0.34698919
             mac        0.082653774 1.22535027
             mobile     0.000000000 0.00000000
             nokia      0.000000000 0.00000000
             samsung    0.000000000 0.00000000
             symbian    0.000000000 0.00000000
             unknown    0.003759398 0.08219078
             windows    0.021158528 0.32916970

the predictors are integers.
is the first column for the 0 predictors and the second for all non-0?
Is there a way to ask naiveBayes to differenciate between non-0 values?

thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000
http://www.childpsy.net/ http://ffii.org http://www.PetitionOnline.com/tap12009/
http://mideasttruth.com http://iris.org.il http://openvotingconsortium.org
The program isn't debugged until the last user is dead.

Sam Steingold

2012-Feb-10 15:01 UTC

head link

[R] e1071/svm: never finishes

When I tried to run svm on the same data frame, memory usage as reported
by top(1) doubled to 4GB almost right away and the function never
returned (has been running for ~15 hours now). ^C does not stop it.
This is most unusual, libsvm has always seemed very fast.

This is R version 2.13.1 (2011-07-08) (as distributed with ubuntu).
> * Sam Steingold <fqf at tah.bet> [2012-02-09 21:43:30 -0500]:
>
> I did this:
> nb <- naiveBayes(users, platform)
> pl <- predict(nb,users)
> nrow(users) ==> 314781
> ncol(users) ==> 109
>
> 1. naiveBayes() was quite fast (~20 seconds), while predict() was slow
> (tens of minutes).  why?
>
> 2. the predict results were completely off the mark (quite the opposite
> of the expected overfitting).  suffice it to show the tables:
>
> pl:
>
>    android blackberry       ipad     iphone         lg      linux       
mac
>          3          5         11         14     312723          5        
11
>     mobile      nokia    samsung    symbian    unknown    windows 
>       1864         17         16        112          0          0 
>
> platform:
>    android blackberry       ipad     iphone         lg      linux       
mac
>      18013       1221       2647       1328          4       2936     
34336
>     mobile      nokia    samsung    symbian    unknown    windows 
>         18         88         39        103       2660     251388 
>
> i.e., nb classified nearly everything as "lg" while in the actual
data
> "lg" is virtually nonexistent.
>
> 3. when I print "nb", I see "A-priori probabilities"
(which are what I
> expected) and "Conditional probabilities" which are confusing
because
> there are only two of them, e.g.:
>
>              android    0.048464998 0.43946764
>              blackberry 0.001638002 0.04045564
>              ipad       0.322251606 1.84940588
>              iphone     0.030873494 0.23250250
>              lg         0.000000000 0.00000000
>              linux      0.023501362 0.34698919
>              mac        0.082653774 1.22535027
>              mobile     0.000000000 0.00000000
>              nokia      0.000000000 0.00000000
>              samsung    0.000000000 0.00000000
>              symbian    0.000000000 0.00000000
>              unknown    0.003759398 0.08219078
>              windows    0.021158528 0.32916970
>
> the predictors are integers.
> is the first column for the 0 predictors and the second for all non-0?
> Is there a way to ask naiveBayes to differenciate between non-0 values?
>
> thanks!
-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000
http://www.childpsy.net/ http://openvotingconsortium.org http://iris.org.il
http://jihadwatch.org http://camera.org http://www.memritv.org
Don't ascribe to malice what can be adequately explained by stupidity.

Uwe Ligges

2012-Feb-11 15:51 UTC

head link

[R] naiveBayes: slow predict, weird results

We don't have the data, but my guess is that you want to have some 
factors in your data that were integers when you tried the code below.

Uwe Ligges


On 10.02.2012 03:43, Sam Steingold wrote:> I did this:
> nb<- naiveBayes(users, platform)
> pl<- predict(nb,users)
> nrow(users) ==>  314781
> ncol(users) ==>  109
>
> 1. naiveBayes() was quite fast (~20 seconds), while predict() was slow
> (tens of minutes).  why?
>
> 2. the predict results were completely off the mark (quite the opposite
> of the expected overfitting).  suffice it to show the tables:
>
> pl:
>
>     android blackberry       ipad     iphone         lg      linux       
mac
>           3          5         11         14     312723          5        
11
>      mobile      nokia    samsung    symbian    unknown    windows
>        1864         17         16        112          0          0
>
> platform:
>     android blackberry       ipad     iphone         lg      linux       
mac
>       18013       1221       2647       1328          4       2936     
34336
>      mobile      nokia    samsung    symbian    unknown    windows
>          18         88         39        103       2660     251388
>
> i.e., nb classified nearly everything as "lg" while in the actual
data
> "lg" is virtually nonexistent.
>
> 3. when I print "nb", I see "A-priori probabilities"
(which are what I
> expected) and "Conditional probabilities" which are confusing
because
> there are only two of them, e.g.:
>
>               android    0.048464998 0.43946764
>               blackberry 0.001638002 0.04045564
>               ipad       0.322251606 1.84940588
>               iphone     0.030873494 0.23250250
>               lg         0.000000000 0.00000000
>               linux      0.023501362 0.34698919
>               mac        0.082653774 1.22535027
>               mobile     0.000000000 0.00000000
>               nokia      0.000000000 0.00000000
>               samsung    0.000000000 0.00000000
>               symbian    0.000000000 0.00000000
>               unknown    0.003759398 0.08219078
>               windows    0.021158528 0.32916970
>
> the predictors are integers.
> is the first column for the 0 predictors and the second for all non-0?
> Is there a way to ask naiveBayes to differenciate between non-0 values?
>
> thanks!
>

Maybe Matching Threads

Search for more maybe matching threads

R help - Feb 2012 - naiveBayes: slow predict, weird results

[R] naiveBayes: slow predict, weird results

[R] e1071/svm: never finishes

[R] naiveBayes: slow predict, weird results

Maybe Matching Threads