03.28.07

Review: Postini Managed Spam Filtering

Tagged as ,

I’ve posted before about my ongoing battle with spam, and some of the weapons I’ve found useful. There was a time when I administered my own email server, with carefully chosen realtime blacklists and a regularly updated set of SpamAssassin rulesets. That was pretty effective, but far too time-consuming—not to mention the expense of having a dedicated server.

Eventually I moved to a shared hosting plan with DreamHost, and tried their built-in spam protection for a while (which also uses SpamAssassin). It was OK, but a bit too conservative—far too much spam was getting through, even when I customized the level at which a message was considered spam. I eventually decided to combine DreamHost’s filters with Gmail, and wrote a blog post and a wiki article about my technique. For a while, that worked beautifully, but something must have changed in Gmail’s filtering setup, because soon I was getting a lot of false positives. One especially annoying category was Amazon Marketplace purchase confirmation emails: without fail, Gmail marked every single one as spam, and since each one came from a different seller’s email address, there was no good way to prevent this from happening.

Next I tried CRM-114, which rarely gave false positives (except from one particular sender whom I had to whitelist), but also never really achieved an acceptable level of confidence. Every day it appropriately marked most of my spam (which I forwarded via procmail to another account) and let the majority of my legitimate email through, but also marked around 5-10 messages a day “unsure.” I then had to train it by forwarding those messages to myself, along with a special command to tell CRM-114 how it should have categorized them. I kept expecting it to get more accurate than that, but after several months I gave up.

And that leads me to the present day. For the past week, I’ve been using a hosted spam filter managed by Postini. Since Postini is designed for large companies, I’m actually going through a reseller called Spam-X, which allows me to filter just one address, as long as I pay for a year in advance. One address comes to $27/year, which is more than worth it for the time it saves me.

In the first week, with just the default settings and a basic whitelist, Postini has caught 419 spam messages, with 4 missed and 2 false positives. Those false positives were both sort of special cases: one was a message telling me I hadn’t won free tickets to a movie, followed by an advertisement; the other was from my bank, telling me there had been some suspicious activity on my account. Both addresses are now whitelisted. I’m still getting the occasional message that hasn’t passed through Postini’s filters at all; apparently some spammers don’t maintain their DNS servers very well. If it continues, I may write a quick procmail filter to reject mail that doesn’t have Postini’s headers.

There are a few features I’d like to see added, like keyword whitelisting, but overall I’m impressed by the feature set. The filters can be tuned to five levels of aggression, and in addition to the general filter you can customize filters for sexually explicit content, racially insensitive content, get-rich-quick schemes, and “too good to be true” special offers. If a legitimate message is mistakenly marked as spam, you can have it delivered to your mailbox as though it had never been blocked—and when you do, Postini asks if you’d like to add the sender’s address to your whitelist. For email discussion groups, you can whitelist “To:” addresses as well as “From:” addresses. I also like that Postini blocks mail before it even gets to my web host’s servers, let alone my local system. I had to change my DNS MX records, but once that was done, I could almost forget about spam altogether.

Verdict: highly recommended. I’m emailing like it’s 1999.

06.24.06

Double-Pass Spam Filtering with Gmail

Tagged as ,

In my continuing battle against spam, I’ve now implemented double-pass spam filtering, using Gmail as my first line of defense and DreamHost’s SpamAssassin installation as the second. I got the idea (and most of the implementation) from MBoffin.com, and have added a page to the DreamHost Support Wiki explaining how to do it using the default setup (i.e., no custom-installed applications) on that host.

The way it works is that all incoming email on my DreamHost account is scanned for the presence of a special forwarding-only Gmail address. If the address is not found, the message is forwarded to Gmail. Gmail decides whether the message is spam, in which case it’s held in the Spam folder, or not, in which case it’s sent back to the DreamHost server. DreamHost again checks for the required address, and this time finds it, because Gmail has inserted it into the headers. Next the message goes through SpamAssassin, which assigns a score depending on how “spammy” the message looks. I’ve set SpamAssassin to quarantine anything with a score of 5.0 or greater, and to tag the subject line but pass anything with a score of 3.0-4.9.

Technically, I’m using a triple-pass filtering system, because Apple’s Mail.app has its own filtering rules. So far, however, it looks like Gmail and SpamAssassin won’t need any extra help.

If you’d like to try it yourself, note that you’ll need to dedicate a Gmail address to this purpose—everything going to that address will be forwarded to your DreamHost account, so don’t use an address you’re already using for other things. If you need a Gmail invitation, feel free to ask me for one—email nicholas at acetylene dot net.

UPDATE: It appears that Gmail is somewhat inconsistent in its use or formatting of the X-Forwarded-For: header, leading to infrequent (but annoying) infinite loops. (Actually, procmail is intelligent enough to reject the message rather than forwarding it a second time, but that means it stays in Gmail and never arrives in my IMAP inbox.) I’ve simplified the rule to avoid this problem in the future; now all procmail is looking for is the address of my forwarding-only Gmail account. As long as it finds that somewhere, it’ll send the message to my IMAP inbox.

UPDATE 2: I was wrong: Gmail wasn’t being inconsistent; DreamHost’s procmail was. Seems it sometimes thought the mail was looping when it saw its own Delivered-To: header, so I’ve edited the DreamHost wiki to include a formail pipe that removes that header before forwarding to Gmail.

11.02.05

Conventional Wisdom

Tagged as

I got a Gmail account pretty soon after the service went live (and I’ve got 100 invites available if anyone needs one), but I mostly only use it for online sweepstakes, news site registrations, and other things that I suspect might result in “affiliate offers”—I do filter spam on my server, but I don’t want to make it work any harder than necessary. And it’s obviously been working—recently I noticed that there were over a thousand messages in my Gmail spam folder, and it’s set to delete anything more than a month old in that folder. Even for a dedicated spam-catching address, that’s a bit much for my taste. So I decided to try something a little unconventional: unsubscribe from the spammers’ distribution lists.

The conventional wisdom, of course, is that you should never do this, because it only confirms to the spammers that your address is valid, causing them to send you even more spam than before. But what did I have to lose? Worst case scenario, I’d invite myself, sign up for a new account, and drop the old one. So I cleared everything out of that folder and for the next week I diligently clicked the “Unsubscribe” link on every spam message I received. It was a pretty significant time commitment, but incredibly, it worked. Spam is now trickling in at a rate of slightly less than one message per day. I’m still unsubscribing to those messages; with any luck, I’ll be able to bring it down to effectively zero.

Whether this is a result of recent anti-spam legislation, or whether I just happened to be getting spammed by the most ethical online marketers in the world, I don’t know. But it just might be that the conventional wisdom no longer applies in the war against spam.