Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
SpamAssassin 2010 Bug (grepular.com)
74 points by mike-cardwell on Jan 1, 2010 | hide | past | favorite | 17 comments


Ubuntu 9.04 affected too, thanks for posting.

Summary: FH_DATE_PAST_20XX matches on yars 2010-2099.

As a workaround until the spamassasin rules are updated, the score can be lowered in local.cf: score FH_DATE_PAST_20XX 0.0


An alternate workaround (which doesn't disable the check altogether, but watch out in 2020!):

    locate 72_active.cf && sudo vi 'locate 72_active.cf'
    search 'FH_DATE_PAST_20XX'
    change '/20[1-9][0-9]/' to '/20[2-9][0-9]/'
Source: http://wiki.apache.org/spamassassin/Rules/FH_DATE_PAST_20XX


This patch is now available via sa-update. Check what you have with this command (assuming your path is similar):

  grep FH_DATE_PAST_20XX /var/lib/spamassassin/3.*/updates_spamassassin_org/72_active.cf
If needed, update with this command:

  sa-update
Then check again to confirm.

If you don't already, consider running sa-update daily in a cron job to mitigate the damage from bugs like these and to benefit from other changes.


Seeing if a date is close to the current date with regular expressions = not a good idea.


It is a matter of going to war with the software you have, not the software you wish you have. SpamAssassin allows you to do plug-n-play logic for tests, but for implementation reasons only accepts regular expressions. Those are Good Enough for most email filtering tasks, very flexible, and have fairly predictable security and resource consequences. This is the architecture that lets spam assassins subject an email to literally hundreds of tests (though in fairness I think they're probably less efficient than just naive Bayesian but don't trust me, my anti-spam researcher days are almost three years in the rear view mirror at the moment) and evolve quickly with the quickly changing, particularized nature of the spam threat at any given installation.

Not to say that its optimal -- it is not -- but there is a reason it is done that way as opposed to having a fully executable plugin architecture which would have access to your date parsing library of choice.


If you look at the bug it's not a problem caused by using regular expressions as such, but rather by the choice of date to be "grossly in the future" (ie. so far in to the future that it couldn't be a legitimate date at the time the software is running).

The regex was chosen to match 2010 to 2099, which it did just fine.

So problem wasn't in choosing to use regex's but in choosing 2010 as the date. I'm sure that date was "grossly in the future" at some point (probably when the regex was first written), but obviously we are living in the future now and need the date to be moved forward.


This is why heuristic tests suck and statistical classifiers are superior.


I don't see why. The heuristic here was fine (is the date too far in the future?), and not part of the problem. The bug was in the implementation of the heuristic ("any date after 2010 is too far in the future").

Maybe you're making a software maintainability argument instead? But clearly this doesn't make a good argument for classifiers vs. heuristics.


This also affects big email hosting providers, eg gmx.net.

http://www.heise.de/newsticker/meldung/Jahr-2010-Problem-im-...



Is rule based spam filtering still helpful? Wouldn't a big default spam database + machine learning work much better instead of rules + an empty default spam database + machine learning?


The rules help a lot, especially in the early training of the bayes database. It's amazing how much stuff is still caught by them... Over time, the SA bayesian recognizer gets good enough that the rules play a relatively small role. I think I had no false positives from this little bug thanks to the bayesian counterweight.

A default database is a poor idea, though. One thing I've learned in helping folks with SA is that people get very different mixtures of spam and mail. I don't think you'd like my database at all. There's really no substitute for making your own...


What does Gmail use? It stops almost all spam. Aren't they using a single big database?


Probably a lot of the effectiveness comes from getting user spam classification feedback. The same letter is generally mass-mailed, so the chance someone got and flagged the spam you got, before you check your mail, is pretty high.


Thank you! My Gentoo Email server was affected, manually fixed for now.


Just noticed this on our installation. Y2K+10...


I knew buying a Y2K10 survival kit would be a good idea.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: