for Spam. In the process of setting up a more effective spam filtering system, I just noticed that bogofilter, which implements extensions of the (a?) "Naive Bayes" text classification approach, will dump out R data frames; the man page suggests how to "integrate" it with R for verification. (sort of, that is). Anyway, for those of you looking for silly and perhaps interesting problems/datasets for your engineering or comp-sci statistics classes, this one looks quite amusing... Looks like Eric Raymond knows (about) R -- a script is apparently included in the source according to the man page, though I couldn't find it in the Debian package. best, -tony -- A.J. Rossini Rsrch. Asst. Prof. of Biostatistics U. of Washington Biostatistics rossini at u.washington.edu FHCRC/SCHARP/HIV Vaccine Trials Net rossini at scharp.org -------------- http://software.biostat.washington.edu/ ---------------- FHCRC: M: 206-667-7025 (fax=4812)|Voicemail is pretty sketchy/use Email UW: Th: 206-543-1044 (fax=3286)|Change last 4 digits of phone to FAX (my tuesday/wednesday/friday locations are completely unpredictable.)
rossini at blindglobe.net (A.J. Rossini) writes:> for Spam. > > In the process of setting up a more effective spam filtering system, I > just noticed that bogofilter, which implements extensions of the (a?) > "Naive Bayes" text classification approach, will dump out R data > frames; the man page suggests how to "integrate" it with R for > verification. (sort of, that is). > > Anyway, for those of you looking for silly and perhaps interesting > problems/datasets for your engineering or comp-sci statistics classes, > this one looks quite amusing... > > Looks like Eric Raymond knows (about) R -- a script is apparently > included in the source according to the man page, though I couldn't > find it in the Debian package.The text in http://www.bgl.nu/bogofilter/BcrFisher.html certainly has one. It could be interesting to try and figure out what is actually going on there - some of it certainly looks weird, and last time I looked at "Naive Bayes" I got the impression that these people would label anything returning a probability as "Bayesian"... -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907