Migrating spamassassin from db files to MySQL

In this post, we’ll take a look at how to migrate spamassassin from the default db files to a MySQL database. The assumption is that you have a working mail server that already does spam scanning and where spamassassin stores its information in the default db files. In order to improve speed, especially for auto-expiry but certainly also for spam filtering itself, it is advisable to store the data in a faster database.

This post assumes you have the following up and running on your server:

  • A working mail server (postfix or otherwise)
  • amavisd-new
  • spamassassin 3.x.x
  • MySQL

Preparation

Before we enable another database in spamassassin or amavisd-new, we will take care of all the necessary tasks to ensure the migration goes as smoothly as possible. Ideally, all we will need to do is restart spamassassin to switch it to the new database. We will need to make an export of the old db files, create a MySQL database for a spamassassin and a MySQL user and password for spamassassin. Then, we will need to tell spamassassin to use the new database and to log in using the user name and password we created for it. We’ll start with creating a new MySQL database.

Preparing the MySQL database

This post does not cover the installation of MySQL and the configuration of it either. I assume you have a working knowledge of how to do that. I will just cover the steps necessary to create the database, the user and matching password. First, login to your MySQL server, either through console or using the MySQL workbench. I will be using the console in this example. Things you type are in bold.

# mysql -u root -p
Password:
mysql> create schema database_name

You can choose any name for the database you like. I have simply called it spamassassin. Next, we will create a user for spamassassin and grant the user privileges on the database. The user will need the SELECT, INSERT, UPDATE and DELETE privileges. The user only needs to connect from the localhost, unless you run spamassassin on a different host, in which case you will need to grant the user access privileges from that host. I will name my user spamassassin as well, but you are free to use any name you like.

mysql> create user 'spamassassin'@'localhost'
identified by 'password';
mysql> grant select,insert,update,delete
on database_name.*
to 'spamassassin'@'localhost';
Query OK, 0 rows affected (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.01 sec)

With the database and user set up, it is time to create the tables that spamassassin will use. Fortunately, a script to accomplish that is included with spamassassin, saving us the trouble of having to do everything manually. The script is called bayes_mysql.sql and can be found in the documentation directory of spamassassin. You may have to search your system to see exactly where it is. In my case, on openSUSE 11.3, it is in the directory /usr/share/doc/packages/perl-Mail-SpamAssassin/sql/. You can create the proper tables using the following command.

mysql -u root -p database_name < /usr/share/doc/packages/perl-Mail-SpamAssassin/sql/bayes_mysql.sql

The script should not take more than a few seconds to complete, if at all. Verify that the tables where created by logging in to mysql and listing the tables.

mysql> use database_name;
Database changed
mysql> show tables;
+-------------------------+
| Tables_in_database_name |
+-------------------------+
| bayes_expire            |
| bayes_global_vars       |
| bayes_seen              |
| bayes_token             |
| bayes_vars              |
+-------------------------+
5 rows in set (0.00 sec)

As you can see, 5 tables now exist where there were none before. The script has succeeded.

Backing up the old database

Because our running spamassassin has already been active for quite some time, it will have collected a lot of information about the email we receive and from that, it will have filled its Bayesian database. It would be a shame to let that information go to waste by switching to a new database because that would mean we have to retrain spamassassin from scratch. Fortunately, we don’t have to do that. We can dump the information from the old db files and import it into the new MySQL database. First however, we will force a synchronization so that all the latest information is in the database before we create the backup. On my system, amavisd-new and thus spamassassin run under the account vscan, but you will have to check if this is true for your system.

# sa-learn --sync
# sa-learn --backup > /var/spool/amavis/backup.txt

This will create a file named backup.txt in the home directory of the vscan user, which on my system is in /var/spool/amasvis.  In it will be all the information spamassassin has stored since it began running and which has not auto-expired yet if you use auto-expiration. We are now ready to switch spamassassin over to the new database.

Configuring spamassassin to use MySQL

Before doing anything else, make a backup of the appropriate configuration files. This makes it easy to return to the way things were, should something go wrong. On most Linux distributions, the configuration file for spamassassin lives in /etc/mail/spamassassin or in /etc/mail and is called local.cf. Change, disable or add the following configuration settings.

# settings to enable mysql database access
use_bayes              1
bayes_auto_expire      0
bayes_store_module     Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn          DBI:mysql:spamassassin:localhost
bayes_sql_username     spamassassin
bayes_sql_password     <password>

# disable the following since we don't need them anymore
#auto_whitelist_path   /var/lib/amavis/.spamassassin/auto-whitelist
#bayes_path            /var/lib/amavis/.spamassassin/bayes
#bayes_file_mode       0777

# make sure that all data is added for the user running amavisd-new
# even if someone else runs sa-learn
bayes_sql_override_username   vscan

When you are sure that the above settings are correct, it is advisable to test the spamassassin configuration before switching over. For that purpose, you can download a test spam message from the spamassassin website.

# wget http://spamassassin.apache.org/gtube/gtube.txt

Next, feed this message to spamassassin using sa-learn and turn on debugging, so you can see what happens. Be sure to correct any errors before proceeding with the next step!

# sa-learn -D --spam gtube.txt

Note:  should you see the message “bayes: unable to initialize database for<username> user, aborting!” in the output of sa-learn, it is safe to ignore it. It appears to be caused by the fact that the database is clean and has no data. It should be gone on subsequent sa-learn runs. If it is not, something is wrong.
To be sure that something has been learned by spamassassin, you can also check the database for information. There should be some information in it now.

# mysql -u spamassassin -p
Enter password:
mysql> use database_name;
Database changed
mysql> select * from bayes_seen;
+----+-------------------------------------------------------+------+
| id | msgid                                                 | flag |
+----+-------------------------------------------------------+------+
|  1 | 15b2b262a6ed121f3c1dcb5561dd69dc254dfad8@sa_generated | s    |
+----+-------------------------------------------------------+------+
1 row in set (0.00 sec)

mysql> select * from bayes_vars;
+----+----------+------------+-----------+-------------+-------------+------------------+--------------------+------------------+------------------+
| id | username | spam_count | ham_count | token_count | last_expire | last_atime_delta | last_expire_reduce | oldest_token_age | newest_token_age |
+----+----------+------------+-----------+-------------+-------------+------------------+--------------------+------------------+------------------+
|  1 | vscan    |          1 |         0 |          64 |           0 |                0 |                  0 |       1058995800 |       1058995800 |
+----+----------+------------+-----------+-------------+-------------+------------------+--------------------+------------------+------------------+
1 row in set (0.00 sec)

With the database initialised and no apparent errors in the output of sa-learn, we should be ready to switch spamassassin over to the new database. However, we still need to restore the backup of our old database so as not to lose all the information spamassassin has already learned about the type of messages we receive. You can also do that with sa-learn.

# su vscan -c 'sa-learn --restore backup.txt'

Depending on the size of the backup file, this may run for quite a while. You can monitor how the process is getting on by logging into mysql and running the query “SELECT COUNT(*) spam_count FROM bayes_token;”, which should increment until the process has finished running.

Tuning the new setup

Before switching over spamassassin to use the MySQL database, there are a two more tuning operations we need to complete. One is to switch the MySQL tables over from MyISAM to InnoDB which will both improve stability and performance.  The other is to make sure auto-expiry runs for our Bayesian database or it will grow very large very quickly. We switched auto_expire off in the spamassassin configuration file because the process can require a lot of processor time, so we would like it to run when the mail system is quiet, say at 3 AM, instead of during peak hours. To do this we will add a cron job.

First however, we will have a look at the database. In order to modify the tables, we will need to log in as root, because the spamassassin user doesn’t have the necessary rights on the table objects.

# mysql -u root -p
Enter password:
mysql> use database_name;
Database changed
mysql> show tables;
+------------------------+
| Tables_in_spamassassin |
+------------------------+
| bayes_expire           |
| bayes_global_vars      |
| bayes_seen             |
| bayes_token            |
| bayes_vars             |
+------------------------+
5 rows in set (0.00 sec)
mysql> ALTER TABLE bayes_expire TYPE=InnoDB;

Repeat the above command for the other tables too. When finished, run the analyze command on each of the tables. All should report their status as OK.

mysql> ANALYZE TABLE bayes_expire;
+----------------------------+---------+----------+----------+
| Table                      | Op      | Msg_type | Msg_text |
+----------------------------+---------+----------+----------+
| database_name.bayes_expire | analyze | status   | OK       |
+----------------------------+---------+----------+----------+
1 row in set (0.00 sec)

The table(s) seem to be OK, so we may assume that the MySQL database is up and running and that amavisd-new will be able to read and write it’s information. We will test this before switching of course.
Now, we need to make sure that records get auto-expired so that the database will not continue to grow. Since auto-expiry is disabled, we’ll need to take care of it some other way. Obviously we don’t want to take care of things by hand (we’re sysadmins after all), so we’ll use cron to automate the entire process. In cron define a task as below, choosing a time that suits you of course.

00 3 * * * /usr/bin/sa-learn --sync --force-expire

This will make sure that daily at 3:00 AM, old tokens will be expired from spamassassin’s database, making sure it stays lean and fit. With all this in place, we are ready to switch over from the db files to our new MySQL database.

Making the switch

In order to test our new configuration, we’ll stop the amavisd-new daemon as follows.

# service amavis stop

This will disable all our spam and virus filtering momentarily, so you may want to do the switch outside of business hours. Next, we’ll start amavisd-new in spamassassin debug mode, so that all activity is logged to the console.

# /usr/sbin/amavisd debug-sa

As soon as you press enter, you should see a whole lot of debug messages on screen. Look for the following two messages, which tell you if the database connection has been established successfully.

/usr/sbin/amavisd[30941]: (30941-02) SA dbg: bayes: database connection established
/usr/sbin/amavisd[30941]: (30941-02) SA dbg: bayes: found bayes db version 3

These lines tell you that spamassassin has successfully established a connection to the database. We’re almost done now, the only thing left is to stop amavisd-new again and restart the daemon. After that, keep an eye on your mail log files for a while and also check the bayes_vars table of your database to see if messages get learned as spam and ham.

Congratularions, you’ve made the switch from db to MySQL!

If you’ve found this post helpful or if you have seen omissions, problems or otherwise, please let me know. I assume no responsibility for any problems or lost mail you may encounter by following these instructions.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.