Exim Spam Filtering with Bogofilter

I have been operating a personal email server for the past 4-ish years with very little trouble. My server itself received a truck-load of spam email, but none of it was delivered because every email was addressed to an account that didn’t exist on my server (love that check_local_user filter). I received maybe one spam email every 3 - 6 months until recently when my email address was leaked in the Aol email breach. While I’m a bit upset at Aol for that, I guess it was bound to happen sooner or later to one of the email providers, so I guess I can’t be too upset. In the end, it’s been a good experience because it forced me to [finally] learn to set up a spam filter with Exim.

I searched the internet for several days weighing the pros and cons of each available spam filter (spamassassin, razor, dspam, bogofilter) until finally settling on Bogofilter due to it’s small size and that it’s written in C (might as well have something that can handle a lot of spam, even if it isn’t).

Once I settled, I ran into the problem that spam filtering isn’t a very well documented thing. All of its parts are fairly well documented, but no one place really seems to put it all together with a good explanation of how each part interracts. Hopefully I can do that for you here.

Assumptions

  1. Each user’s mail is stored in maildir format

  2. Each user’s mail is stored in the ~/Mail directory

  3. Spam will be stored in a directory called spam

  4. Less sure emails will be delivered to a unsure directory

Bogofilter Configuration

First, we need to set up the actual mail analysis software, Bogofilter. My bogofilter configuration is fairly simple. To keep things nicely relegated to one area of my server, I have my bogofilter logs and word databases stored in /home/mail/bogofilter.

Regarding the configuration file (/etc/bogofilter/bogofilter.cf), I am using the following simple configuration.

/etc/bogofilter/bogofilter.cf
bogofilter_dir = /home/mail/bogofilter/
ham_cutoff  = 0.60
spam_cutoff = 0.80

To give you an idea of what that does, emails with a "spamicity" rank of 60% or higher are listed as Unsure (remember, ham is good email) and thus will be sent to the unsure mail directory. Emails with a "spamicity" rank of 80% or higher will be sent to the spam directory (see #Assumptions section).

Exim Configuration

Routers

Routers in Exim do just what their name indicates: route email. Specifically, they route email to transports, but more on those in the next section. One thing to note on these before we get to the actual configuration part, routers in Exim are all executed, in sequence, until the email is either denied or delivered.

Note: To give the reader a better idea of where the spam-related routers go, I have included the router names for the defaults to provide context. Spam-related routers are listed in bold.

/etc/mail/exim.conf
begin routers
...
dnslookup:
...
#
# BOGOFILTER router
#
# Routes all mail to spam@domain.tld to the bogo_spam_transport
bogo_setspam_router:
  driver = accept
  condition = ${if eq {$local_part}{spam} {yes}{no}}
  transport = bogo_spam_transport

# Runs the received email through as a neutral status to be scanned.
bogo_check_router:
  no_verify
  check_local_user
  domains = +local_domains
  condition = ${if !eq {$received_protocol}{bogodone} {1}{0}}
  driver = accept
  transport = bogo_check_transport

...
system_aliases:
...
user_forward:
...

# Delivers bogo spam mail to the spam directory
localuser_bogo_spam:
  driver = accept
  check_local_user
  condition = ${if match{$h_X-Bogosity:}{Spam.*}{1}}
  transport = local_delivery_spam
  cannot_route_message = Unknown user

# Delivers bogo unsure mail to the unsure directory
localuser_bogo_unsure:
  driver = accept
  check_local_user
  condition = ${if match{$h_X-Bogosity:}{Unsure.*}{1}}
  transport = local_delivery_unsure
  cannot_route_message = Unknown user

...
localuser:
...

What we just did here is create four new routers. Here’s what each does.

bogo_setspam_router

Sends emails sent to "spam@domain.tld" to the bogo_setspam_transport.

bogo_check_router

Sends all emails to the bogo_check_transport.

localuser_bogo_spam

Sends all email to the local_delivery_spam transport.

localuser_bogo_unsure

Sends all email to the local_delivery_unsure transport.

Those explanations make routers seem like they don’t do much at all, and without corresponding transports, that would be true. Routers only serve to route mail that matches certain criteron to the appropriate transports.

Transports

Transports in Exim perform actions (you might also call these drivers). They are not processed unless an email is sent to them by a router. Consequently, they can be placed anywhere aned in any order within the transports section of the Exim config file.

/etc/mail/exim.conf
begin transports
...
# Bogofilter will add X-Bogosity header to all incoming mail. This can go
# anywhere in the transport section, usually at the very end after
# address_reply
bogo_check_transport:
  driver = pipe
  command = /usr/bin/exim -oMr bogodone -bS
  use_bsmtp = true
  headers_add = X-Bogofilterd: true
  transport_filter = /usr/bin/bogofilter -d /home/mail/bogofilter -l -p -e -u
  return_fail_output = true
  group = mail
  user = exim
  home_directory = "/home/mail/bogofilter"
  current_directory = "/home/mail/bogofilter"
  log_output = true
  return_path_add = false

# This adds updates the bogofilter database with this email explicitely set as
# spam (intended for spam@domain.tld)
bogo_setspam_transport:
  driver = pipe
  command = /usr/bin/bogofilter -d /home/mail/bogofilter -s -l
  use_bsmtp = true
  return_fail_output = true
  group = mail
  user = exim
  home_directory = "/home/mail/bogofilter"
  current_directory = "/home/mail/bogofilter"
  log_output = true


# Called when delivering mail to the spam directory
local_delivery_spam:
  driver = appendfile
  directory = $home/Mail/.spam
  maildir_format
  maildir_use_size_file
  delivery_date_add
  envelope_to_add
  return_path_add

# Called when delivering mail to the unsure directory
local_delivery_unsure:
  driver = appendfile
  directory = $home/Mail/.unsure
  maildir_format
  maildir_use_size_file
  delivery_date_add
  envelope_to_add
  return_path_add

We just added four transports.

bogo_check_transport

Uses the pipe driver. Essentially, this one is a passthrough transport. It takes the email text and sends it through the bogofilter binary with a neutral status. The bogofilter binary inserts a few headers into the email as it processes, and then returns. The most important of these headers for our purposes is the X-Bogosity header. This one will be used later on for delivering mail to the correct directory.

bogo_setspam_transport

This transport also uses the pipe driver. It is called by the bogo_setspam_router, which only catches email sent to "spam@domain.tld". The intent of this router is to mark all emails sent through it explicitely as spam. This is so users can foward a spam email the filters missed to "spam@domain.tld" and the filter will update itself to assume the text in the received email is "spammy".

local_delivery_spam

This transport is a final delivery transport (the appendfile driver). All email sent through this transport will be delivered to the destination user’s "spam" directory.

local_delivery_unsure

This transport is a final delivery transport (the appendfile driver). All email sent through this transport will be delivered to the destination user’s "unsure" directory.

A Few Examples

There are a few possible paths a given email could take through this system.

A Spammy Email

If you get, for instance, an email that bogofilter would indicate is spam. Here’s how its path would go using the previous configurations.

  1. Exim receives the email. The bogo_setspam_router is skipped because the email was sent to you, not spam@example.com

  2. The next router in line, bogo_check_router, is used because it catches all email. It routes the email through the bogo_check_transport transport.

  3. The bogo_check_transport has been called and thus pipes the email through the bogofilter binary

  4. The bogofilter binary inserts the X-Bogosity header. In the case of this email which is most likely spam, it will insert "X-Bogosity: Spam".

  5. Exim continues through the routers since the email still has not been delivered.

  6. The next router in line is localuser_bogo_spam. It checks that the email header "X-Bogosity" is equal to "Spam". In this case, the bogo_check_transport inserted this header and value, and so this router sends the email through the localuser_delivery_spam transport.

  7. The localuser_delivery_spam transport (being called by the localuser_bogo_spam), delivers the email to the user’s spam directory.

A Hammy (Good) Email

If anyone has questions about this post, please ask your question on the discussion page and I’ll try to get this updated with explanations. Setting up a mail server is hard enough for new folks, without adding the extra complication of spam filtering (I’m fairly new to this myself), so please ask any and all questions.

Category:Mail Category:Linux