Spamassassin bayes_toks size

bayes_toks growing without bounds?

 

bayes_toks taking up more than a few hundred megabytes?

 

# df -h .
Filesystem     Size    Used   Avail Capacity  Mounted on
/dev/da1s1d     96G     85G    3.6G    96%    /home
server:/home# du -kx | sort -rn | head
74514109        .
74378703        ./jails
67309703        ./jails/jail-ip
49390558        ./jails/jail-ip/var
48614106        ./jails/jail-ip/var/amavis
48613908        ./jails/jail-ip/var/amavis/.spamassassin
12913142        ./jails/jail-ip/home
12895420        ./jails/jail-ip/home/mysql
12593508        ./jails/jail-ip/home/mysql/dbmail
7068966 ./jails/jail-ip
server:/home# cd ./jails/jail-ip/var/amavis/.spamassassin
server: ..ail-ip/var/amavis/.spamassassin# ls -l
total 97227812
-rw-------  1 amavisd  amavisd    249274368 Jan 15 19:36 auto-whitelist
-rw-rw-rw-  1 amavisd  amavisd          936 Jan 15 19:42 bayes_journal
-rw-rw-rw-  1 amavisd  amavisd     41373696 Jan 15 19:42 bayes_seen
-rw-rw-rw-  1 amavisd  amavisd  49475297280 Jan 15 19:42 bayes_toks

That's 46GBytes for the bayesian filter and 12GB of email. Pretty not the way it should be.

 

Why does it happen

God bless this guy who described all the background:

http://www.pingle.org/2011/02/10/rapidly-growing-bayes_toks

(and thanks twice since he didn't just suggest "delete it, idk what it does")

 

How to fix it

 

With that info, we find the actual _clean_ fix to be as simple as this:

 

sa-learn --backup > bayes_backup.txt && rm /var/amavis/.spamassassin/bayes_toks && sa-learn --restore bayes_backup.txt

 

So, text-dump the actual useful data, drop the database, restore it.

 

Possible traps

note, in the 46GB case above this involves a little more IO than you'll enjoy. (scanning the 46GB file, constantly checking if a new entry was encountered, using non-optimized perl)

 

In this case, the processing apparently aborted after the perl-driven file lock expired. I noticed by tracking the file age of the bayes_backup.txt.

locker: error accessing /var/amavis/.spamassassin/bayes.lock: No such file or directory at 
...SpamAssassin/Locker/UnixNFSSafe.pm line 190.
locker: safe_unlock: lock on /var/amavis/.spamassassin/bayes.lock was lost due to expiry at 
...SpamAssassin/Locker/UnixNFSSafe.pm line 219.

Absolutely love the fact they simply invoke NFS file locking, even on a nullfs local mount. That's smart iops throttling, eh?

  • No labels