Digital Choke Daynotes

"Daynotes" are popularized by a Internet Web site called the "Daynotes Gang" (http://www.daynotes.com or http://www.daynotes.org), a collection of the daily technical and personal observations from the famous and others. That group started on September 29, 1999, and has grown to an interesting collection of individuals. Readers are invited and encouraged to visit those sites for other interesting daily journals. You can send your comments to us by clicking on any mailbox icon.

My Experience with Spam-Filtering Software

Rick Hellewell
Comments to digitalchoke@digitalchoke.com

At my work, I am in charge of the security of the network. One project, which got the attention at our top level of management, is the blocking of spam messages, especially the more offensive mails.

We process about 50,000 messages a day. Of those, about 70% are incoming messages. With about 2500 employees with computer/email access, you can see that incoming mail is much more of a problem than outgoing mail. (Although there were some abusers of outgoing mail, as you will see later.)

We looked at various alternatives, and chose the SurfControl EMAIL Filter product. This is a rules-based blocking product. We have about 25 rules that we use here. Each rule looks at the message's content or source. Source blocking is done though known spammer lists. The server gets a fresh list of known spammers every night. If a message comes from a known spammer, then the message is blocked.

For the various rules, each message's content (all or part of the content, such as headers and attachments, who it is from, who it is to, etc) is examined. The words in the message are compared to various dictionaries, for example of adult words or shopping words. Each word in the dictionary has a value. All the found words' scores are totaled. The rule establishes a limit to the total score count. If a message is over the limit, it is blocked.

For example, suppose a "nature" dictionary contains the word "tree", and that word has a score of 40 points. If a message has 6 occurrences of the "tree" word, then the total score is 240. If we establish a rule that has a limit of 200 points from the 'nature' dictionary, then that message is over the limit.

If a message is caught by a rule, the software allows you to do several different things to the message. You can delete it, place it in a holding area, send a blind carbon to another person, send a canned reply to the sender, or allow the message.

While we were testing the software, I set up the various rules to send blind carbons to me with the message subject changed, and let the message be delivered. I set up rules in my mailbox to place various message subjects in separate folders. So, to continue with the example, there would be a 'nature' folder for all the messages that ran afoul of the nature dictionary rule.

The purpose of that monitoring was to make sure that the planned rules didn't catch any valid messages. By looking at the blocked messages, I was able to fine-tune the rules to block the bad messages, while still allowing valid messages. Of course, this meant that my mailbox was subject to tons of all kinds of mail, from the R-rated jokes to worse, along with every pitch for every kind of product from printer toner to pills for every ailment (including ailments you wouldn't want to discuss in public). And, since I was looking at all mail, incoming and outgoing, I learned a few things about a few employees that I didn't really want to know.

(Most people don't know how transparent email systems are. Any message you send or get can be seen by any number of people. Email is not private, even with the protection of various federal wiretap and 'pen-register' laws. [Monitoring of your network access, whether at work or at home, is covered by many different federal and state laws. That's a subject for another time.])

All of this monitoring and tweaking of the anti-spam rules took about 6 weeks, and about 2-5 hours a day. The result was a fairly good and efficient system to filter spam and offensive mail. We have been very successful in blocking outgoing offensive (adult) messages from our email users. That's mostly because we have been sending them an automatic reply when the adult message is blocked. (We tell them why it was blocked - because of apparent adult content - and that they can call the help desk to get the message released. Not surprisingly, we don't get very many of those calls.) We are also blocking any message with an executable attachment - mostly viruses - even though we have a second anti-virus mail server that looks for viral content. And we are blocking a goodly number of adult/offensive type messages. We are also blocking any message that comes through a mail relay.

Here's what has happened the last seven days. (Remember that there was a 3-day weekend, so I think that the percentages are a bit low.)

We have processed over 200,000 messages (incoming and outgoing). Of those, we've managed to block about 30,000 (about 15%). Some days it is closer to 25%. Of those that we have blocked, about 65% are shopping/marketing junk. About 2% is adult/offensive content. About 1% is virus-related. And we blocked about 1200 of the "Nigerian Scam" messages.

Remember that these numbers are only what we have been able to block. But our blocking not foolproof, no matter how much I can tweak the rules. I don't have any hard numbers, but I think that another 15% is not being blocked.

There are various tricks that 'successful' spammers can use to get a message through the mail filter. know how mail-filtering software works. They know how the dictionaries work. And they keep changing their mail address to get around 'known spammer' lists.

You've seen lots of numbers bandied about by legislators and others. Some of them say that spam accounts for more than 80% of all email. I don't think it is that high. But I do think that 30% is probably a good number. And that indicates a problem. (Of course, it sometimes seems that 30% of my 'snail-mail' is junk.)

Has it been worth the effort here? I think so; we are able to block a lot of spam. I wish the blocking processes/procedures were better. And so do the users around here, since I get daily complaints of stuff that we haven't been able to block. The software we use is one of the top three brands. But software, due to the nature of spam, is not ever going to be able to block spam.

I am not sure what should be done. I think it will require a fundamental redesign of how the whole Internet mail thing works. A better and more secure mail client (on user's computers) along with better and more secure mail servers will help. Web authors need to be aware of how to prevent email address harvesting (for instance, just putting my email address up there will get me on several spammer's lists, I suspect.) New laws will be helpful, but won't be near enough of a solution. Physical violence, no matter how initially satisfying that might seem, is certainly not the answer. Personal responsibility of computer users would be helpful, but there is a knowledge base that is needed to keep your computer secure. ("Aunt Minnie", no matter how much help we give her, is not computer literate enough to know how to tweak and update her system get rid of mail vulnerabilities - not to mention virus problems.)

Daynotes	a daily journal of our activity	Send us email
Digital Choke	an action that is sometimes needed for your computer; also a short techno-story available here.	Send us email