Installing spam assassin

Spam Assassin uses a heuristic to determine whether a message is spam. It adds the word "SPAM" to the subject, which can then be used to process it with procmail.

Installation

Build and install the scripts
   cd Mail-SpamAssassin-*
   perl Makefile.PL
   make
   make install  [as root]

If the perl command gives any warnings about missing requirements, don't proceed further until you have installed the needed files from CPAN . If the modules cannot be installed from within perl, download them as tar.gz files and compile and install them manually before proceeding. Then run "make clean" before running "make" in the spamassassin directory again.
Add to /etc/rc.d/local
/usr/bin/spamd -d -u nobody
(or "/usr/sbin/spamd -d -u nobody", depending on which spamd you want).
Add to /etc/procmailrc
   DROPPRIVS=yes
   :0fw
   * < 256000
   | spamc
Edit /etc/mail/spamassassin/local.cf and change the sensitivity to 10. Set report_safe to 0 to prevent it from trashing the message, in case it turns out to be a legitimate email. Set rewrite_subject to 1 so it puts "****SPAM****" in the subject line.
required_hits 10.0
rewrite_subject 1
report_safe 0

NOTE: Version 3.x of spamassassin uses different syntax. The new local.cf should be:
required_hits 4.0 rewrite_header Subject *****SPAM***** report_safe 1 use_bayes 1 skip_rbl_checks 1 use_pyzor 1
rewrite_subject no longer works.
See man Mail::SpamAssassin::Conf for the man page that describes configuring spamassassin.
Edit /usr/share/spamassassin/20_head_tests.cf and /usr/share/spamassassin/20_dnsbl_tests.cf and comment out all lines that refer to relays.osirusoft.com, bl.spamcop.net, orbs.dorkslayers.com, and ipwhois.rfc-ignorant.org. These are known to be corrupt spam databases (see below).
(Optional) Edit /usr/share/spamassassin/50_scores.cf and change the weightings for spam features if desired. For example, we increased the weight of SUBJ_FREE_CAP and set all the entries starting with RCVD_IN_OSIRU to 0.

Start the spamd daemon

/usr/bin/spamd -d -u nobody

If it says

Can't locate HTML/Parser.pm in @INC (@INC contains: ../lib /usr/lib/perl5/site_perl/5.6.0 /usr/local/lib/perl5/5.8.3/i686-linux-64int-ld /usr/local/lib/perl5/5.8.3 /usr/local/lib/perl5/site_perl/5.8.3/i686-linux-64int-ld /usr/local/lib/perl5/site_perl/5.8.3 /usr/local/lib/perl5/site_perl) at /usr/lib/perl5/site_perl/5.6.0/Mail/SpamAssassin/HTML.pm line 7.
... more error messages ...

it means your version of HTML::Parser is too old. Install HTML::Parser from CPAN , delete the Spamassassin source tree, and re-extract it from the .tar.gz file. (Running "make clean" doesn't work).

Test spamassassin
spamassassin -t < sample-nonspam.txt > nonspam.out
spamassassin -t < sample-spam.txt > spam.out
Check X-Spam-Status: should say "No" for the nonspam and "Yes" for the spam.
Check for errors in the .cf files by typing
spamassassin -d --lint

If user doesn't want spam checking, they should edit ~/.spamassassin/user_prefs and change
# required_hits 5
to
required_hits 100
For man page
perldoc Mail::SpamAssassin::Conf
perldoc Mail::SpamAssassin
Leave it like this for a week, so you can tell if it's working. Then add a line to ~/.procmailrc for each user to drop the spam
:0:
* ^X-Spam-Status: Yes
spam
Now just sit back and wait for the complaints to roll in. You may have to tweak the sensitivity value in /etc/mail/spamassasin/local.cf. The best balance is achieved when 50% of your users complain that too much is being blocked and 50% complain that too little is being blocked.

PROBLEMS

Spamassassin not starting, error with "persistent_udp".
Solution: install NET:Dns from CPAN.

Spamassassin won't compile, complaining about db.h. Or, spamassassin compiles after ignoring messages about DB_File and mail::spf, but then won't start.

These two packages are listed as optional but appear to be essential. However, on one of our servers, they would not install from the perl interface and had to be compiled and installed from tar.gz source packages.

It was also necessary to manually install Net-Ident and Mail-SPF-Query to avoid messages like

spamd: Can't locate Mail/SPF/Query.pm in @INC

version.c:30:16: db.h: No such file or directory
Compile and install libdb from www.sleepycat.com. Then do the following:
cd /usr/include
cp db1/* .
Do not use the db2 files. Bayes version 2 can't be used.

Download, compile, and install DB_File-1.811.tar.gz from search.cpan.org or www.cpan.org/modules/by-module/DB_File
Download, compile, and install Mail-SPF-2.00.tar.gz from search.cpan.org or www.cpan.org/modules/by-module/DB_File
Return to the spamassassin directory and type:
make clean
perl Makefile.pl
make
make install
Check to make sure new versions of spamd, spamc, spamassassin, and sa_learn were installed, and there are no old versions lying around.
Re-start spamassassin:
killall spamd
# /usr/bin/spamd -d -u nobody
In the system logs, it should say:

spamd: server started on port 783/tcp (running version 3.1.0) 
spamd: server pid: 10859 
spamd: server successfully spawned child process, pid 10861 
spamd: server successfully spawned child process, pid 10862 
prefork: child states: II

Wait several minutes, and check to make sure it hasn't crashed.

spamd: Can't locate Mail/SPF/Query.pm in @INC
Installed Mail::SPF::Query from CPAN.
Spamassassin won't start, saying
spamc: connect(AF_INET) to spamd at 127.0.0.1 failed, retrying (#1 of 3): Connection refused
This may be caused by an old version of spam assassin lying somewhere in your path. The latest version of spam assassin does not always install the files if older versions already exist. Remove all copies of spamd and spamc, sa-learn, and spamassassin in /usr/bin, /usr/sbin/, etc. and type "make install" again.
When run as a non-daemon, Spamassassin segfaults
/usr/bin/spamd -u nobody
Segmentation fault
We had this problem on one server. We found that it is essential to install DB_File and Mail::SPF (see above). These perl modules had to be installed manually from the tar.gz file. After rebuilding spamassassin, it finally started up without crashing.
Messages not being marked as spam
Check the following:
- Make sure the following lines are in /etc/mail/spamassassin/local.cf:
  rewrite_subject 1 report_header 1 report_safe 2 required_hits 4.2
  Or, for version 3.x, substitute the following:
  rewrite_header Subject *****SPAM*****
- If you use the command line
  /usr/bin/spamd -d -L -c
  it should create a .spamassassin directory for each user. The file bayes.lock must be present, or spam will not be checked.
- If it works for some users and not others, one reason can be that they are using different shells. For example, we found that users with no valid shell were being checked while users with bash were not being checked.
- Remove .spamassassin directories for all users so they are regenerated.
- Temporarily set your .spamassassin/users_prefs to say
  required_hits 0
  and start spamd with the -D (debug) option. Comments such as
  logmsg "aaa";
  in spamd are also helpful in determining where spamd is crashing or exiting prematurely.
- Make sure spamassassin is running as "nobody"; otherwise it will not work. This is different from previous versions. The tradeoff to running it as "nobody" is that it can't read or create the users' .spamassassin directories. To get around this, type the command as root:
  chmod a+x /home/*/.spamassassin
  spamassassin should be started with the command line
  /usr/sbin/spamd -d -u nobody
  (or "/usr/sbin/spamd -d -u nobody", depending on which spamd you want). Some people prefer to start it with "/usr/bin/perl /usr/sbin/spamd -d -u nobody ". It is not clear whether this makes any difference.
Good messages from some senders being marked as spam
This is frequently observed when spamassassin checks a "blacklist" database. It is advisable to turn off the blacklist feature completely. This feature checks to see whether the sender's domain is listed as a mail relay. Unfortunately, many (if not most) of the entries on public blacklists are added either for political reasons or out of malice or stupidity and are not, in fact, relays. For example, one such list blacklisted the entire comcast.net domain because of a single complaint!

The easiest way to allow a domain or server is to add the following to /etc/mail/spamassassin/local.cf or /usr/share/spamassassin/60_whitelist.cf
whitelist_from add@ress.com whitelist_from *ress.com
Slightly better is
whitelist_from_rcvd *.domain.net
Unfortunately, this method doesn't fix the problem of SpamAssassin checking corrupted blacklists. The easiest way to prevent this is to edit SpamAssassin's rules (/usr/share/spamassassin/20_head_tests.cf and /usr/share/spamassassin/20_dnsbl_tests.cf) and delete lines containing the faulty blacklist server(s).

So far, we have found it necessary to block relays.osirusoft.com, bl.spamcop.net, and ipwhois.rfc-ignorant.org.
Spamassassin not marking emails with the "*****SPAM*****" marking or identifying all messages as "good".
This can occur after upgrading spamassassin. The default setting of the new version is to not alter the subject line of spam messages. Edit /etc/mail/spamassassin/local.cf and change
# rewrite_subject 0
to
rewrite_subject 1
Or, for version 3.x, substitute the following:
rewrite_header Subject *****SPAM*****
It's also necessary to delete all perl modules and config files from any previous version of spamassassin before installing a new version, if you install an older version over a newer version. In this situation, be sure to save the old files, because SpamAssassin sometimes appears to install when in fact it is not installing anything.
The spam alert does not work if the email is forwarded to another server.
This is normal. The .forward file is processed before /etc/procmailrc.
Error messages in logs
If it says
Use of uninitialized value in numeric gt (>) at \ /usr/lib/perl5/5.8.1/i586-linux-thread-multi/DB_File.pm \ line 270, <GEN25> line 2.
This message seems to prevent Spam Assassin from detecting a message as spam. Add the line
$db_version = 2 ;
to the DB_File file before line 270 (of course, you would set it to 1 if your libdb is earlier than version 2).

Anomy

Spam Assassin does not interfere with Anomy. If the /etc/procmailrc file is
   VERBOSE=yes
   LOGFILE=/var/adm/procmail-sanitizer.log
   Anomy=/usr/local/bin/anomy/
   :0 fw
   |/usr/local/bin/anomy/bin/sanitizer.pl

   DROPPRIVS=yes
   :0 fw
   * < 256000
   | /usr/bin/spamc
and ~/.procmailrc is
   :0:
   * ^X-Spam-Status: Yes
   spam
The message is moved to "spam" and also defanged. To be safe, Anomy should be first since viruses are more dangerous than spam. Some problems have been reported combining these two when Postfix is used. Set "verbose" to "no" once it's working to prevent anomy from adding several lines of information to the body of every incoming email.

Sample /etc/mail/local.cf file

We white-listed MCI, our service provider, because they frequently send mail in HTML format starting with "Dear Beloved Customer", which gets marked as spam. Messages from Comcast users were also erroneously being marked as spam.

rewrite_subject 1 required_hits 4.2 report_safe 0 whitelist_from *mci.com whitelist_from *@mci.com
Or, for version 3.x, substitute the following:

rewrite_header Subject *****SPAM*****

Back