gecko-dev/mailnews/mime/emitters
scott%scott-macgregor.org ea0646f211 Bug #230093, Bug #181534, Bug #237095 --> Port thunderbird junk mail improvements to the trunk.
Replace the core bayesian junk mail algorithm with a chi-squared probability distribution
modeled after spam bayes and Gary Robinson's work.

Change the model for how we count tokens across messages.

Token counts get out of alignment when re-training against already classified messages.

Revamp the junk mail tokenizer. Make it a hdr sink listener and add custom tokens for attachment
information. Ignore tokens larger than 13 characters. Tokenize purely off of white space.
Ignore tokens less than 3 bytes in length. There is still a lot more work to be done to the tokenizer.


Many thanks to Miguel Varga for working out the initial core algorithm improvement and to all
of the folks at spam bayes and of course Gary Robinson for helping to make this happen.
2004-05-12 18:16:32 +00:00
..
build Bug 236613: change to MPL/LGPL/GPL tri-license. 2004-04-17 18:33:16 +00:00
resources Bug 236613: change to MPL/LGPL/GPL tri-license. 2004-04-17 18:33:16 +00:00
src Bug #230093, Bug #181534, Bug #237095 --> Port thunderbird junk mail improvements to the trunk. 2004-05-12 18:16:32 +00:00
.cvsignore
Makefile.in Bug 236613: change to MPL/LGPL/GPL tri-license. 2004-04-17 18:33:16 +00:00