I was able to get anti-spam working on the new phaedo after some time of fiddling with it. I had this working on the mailsorters at work, and had sort of double-confused myself in the process of redoing it for home. I wanted to make sure that I incorporated some of the expanded rulesets from the SpamAssassin Rules Emporium, and in the process overrode a number of the scoring parameters from the default SpamAssassin installation.

After that, it was a matter of figuring out how to train the Bayesian database. — Curious aside, when I first started talking to Whirl about the use of Bayesian analysis for spam identification she immediately knew what I was talking about. This is one of the rare instances when our two fields actually overlapped. They use the same sort of methodology for DNA analysis and species differentiantion. I thought it was cool.

Anyway, the goal is to set up a system-wide anti-spam setup that will learn about Spam from lots of different sources: the users on the system, the users not on the system. (I get a lot of mail bouncing off usernames that don’t exist here, and I want to use those messages, analyze them and use that data to further identify spam), as well as user-submitted misses (and hits, should they happen).

But before any of that can really work, we need to populate the Bayes database with some known good data. I have it set to auto-learn, but populating it by hand — these emails are in fact not spam — helps a lot. I got that to work, today, and now am seeing Bayesian probabilities assigned to incoming mail.

Advertisements