About Bayesian filter

Internet Exchanging Messaging Server now provides a interface for system administrator to hook a "Bayesian filter" into the Local Mail Delivery Module (LMDA). The Bayesian filter is a statistical process to indentify a spam mail message. For details about the operating princial of the Bayesian filter, please consult the acticle A Plan For Spam written by Paul Graham.

In this version of Internet Exchange Messaging Server, we bundle the bogofilter version 0.11.2. Details about bogofilter can be found Here.

Configure Bayesian filter for LMDA

Before the LMDA module can use the Bayesian filtering features, you need to enable it. Check the Enable check box and select to use either a DLL or a EXE type Bayesian filter. A DLL Bayesian filter is a software library that can be loaded into the memory space of the LMDA module. On the other hand, a EXE type Bayesian filter is a standalone binary executable which can be invoked by the LMDA module when processing each mail message. In general, a DLL type Bayesian Filter runs faster than a standalone EXE type filter.

Configure DLL type Bayesian filter

When you install Internet Exchange Messaging Server, the library version of bogofilter is installed automatically. This library is installed in the following location:

The library should expose a function that can be called by the LMDA module. The function name provided by libbogo library is "iems_bogofilter".

Configure EXE type Bayesian filter

If you are going to use a EXE type Bayesian filter, you need to provide the command line arguement and the return code provided by the filter to indicate the filtering result. The following directives are supported in the filter command line:

Example
  1. | /usr/local/bin/myfilter -d %B
    Expanded to:
    | /usr/local/bin/myfilter -d /var/spool/iems/msgstore/john@company.com/.bayesian
  2. /usr/local/bin/myfilter -d %B -I %M
    Expanded to:
    /usr/local/bin/myfilter -d /var/spool/iems/msgstore/john@company.com/.bayesian -I /var/spool/iems/mqueue/01/1.msg

Beside the filter command line, you need to define the return code that your Bayesian filter uses to indicate different conditions. The return codes are integer number various from 0 to 255. If your filter may return different return code values for the same condition, use a COMMA to separate each of them. The 4 conditions are:

  1. Detected non Spam message
  2. Detected Spam message
  3. Undetermined message
  4. Error condition
You filter must provide return code for the first two conditions.

About Bayesain filter learning engine

Bayesian filtering is a statistical process that requires some training in order to obtain accurate result on spam message detection. In Internet Exchange Messaging Server, a program namely "bayesianlearn" is provided for this purpose. This bayesianlearn program is a command line utility that looks at the Spam and Good messages under a predefined mailbox folder by each MessageStore user. Each message will be submitted to the underlying Bayesian filter training engine. Your Bayesian filter training engine must support the following features:

  1. Ability to add a message to the Good mail list
  2. Ability to remove a message from the Good mail list
  3. Ability to add a message to the Spam mail list
  4. Ability to remove a message from the Spam mail list
When bayesianlearn program starts, it perform the following tasks for each MessageStore user:
  1. Read the name of the Good message folder defined by the MessageStore user
  2. For each message in the Good message folder
  3. Read the name of the Spam message folder defined by the MessageStore user
  4. For each message in the Spam message folder

Configure Bayesain filter learning engine

You need to configure the commandline arguments of your Bayesian filter training engine. The following directives are provided:

There is a locking mechanism between the LMDA module and the bayesianlearn program. When LMDA fails to accquire the lock, it will keep on retrying until it reaches the TIMEOUT ( default is 15 minutes ). When LMDA reaches the timeout, it will send a notifcation to the system postmaster account and terminates. If you receive such notification, you should check if there is any problem that the bayesianlearn program fail to release the lock. You may need to terminate the bayesianlearn program manually and remove the "bayesain.lock" file under each of the MessageStore user's HOME directory. Restart LMDA afterward.