January 3, 2007

In defense of Challenge-Response spam detection systems

Like a lot of people, I get a lot of spam. And in the past months, it has gotten a lot worse. On average, I get well over 1,000 spam a day, and that is after spamassassin has already processed it and deleted what it detects without me ever even seeing that. Of those that get through, Outlook puts most in the Junk E-email folder, leaving me about 100 a day to delete - intermixed with my 50 or so legitimate emails a day.

In terms of actual time lost, it is not great. Maybe 2 seconds per email to delete or 2*100=~3 minutes plus a couple of minutes to look through the junked email to salvage good ones (of which I typically find about 1 per day) for a total of maybe 5 minutes a day.

But in terms of mind share and detraction, this is huge. It means that I am continuously distracted all day long by the dregs of society - pornography, rampant commercialism, and fraud. This is the worst kind of distraction, not only taking my mind away from my flow of concentration, but doing so in a way that I do my best to avoid in every other aspect of my life, and that I would not even consider letting my 7 year old daughter have access to.

So, after giving up on all the standard solutions to spam, I signed up for SpamArrest, a commercial "challenge-response" spam detection system. This works by requiring everyone that wants to send you email to first follow a link to a website and prove they are human by reading a word in a warped image and typing it (i.e., a CAPTCHA). The reason this approach works is that:
  • Each sender only has to do this once for me. The system remembers that person for the future.
  • I can preload the system with all of my contacts and anyone I've sent email to in the past so that everyone I already communicate with won't have to validate themselves, and won't know I am using this system.
  • New people that send me email have to use this system once, and legitmate senders are usually willing to go through this step.
  • I can authenticate senders unlikely to do this (like various large e-commerce sites).
  • I can let email lists through by setting them up indivdually.
  • Spammers that send me email are almost never willing to go through this step, and so I never see their email. The reason that spammers aren't willing to do that is because they are computer software and can't, or because they are human and don't want to spend the time. In fact, most spam is sent by "spambots" which are other people's computers hijacked for the sole purpose of sending spam. This email is sent with forged email return addresses, so they never even receive the request for validation.

So, if this is such a panacea, why isn't everyone using it? Well for one thing, you have to pay for it (about $3/month). But lots of people think this approach is a bad idea in principle, and have been arguing against it. However, while I agree that it does have problems, it is better than any current alternative, and I'm not going to wait around suffering while I wait for better solutions. So, let me respond to one complaint about challenge-response systems. I'll summarize the complaints and respond here.

Concern #1. Spammers will forge mail to me with someone else's return address thus sending my challenge to the poor forgee's email box.

Looking at the actual spam I receive, the vast, vast majority has false return addresses. And of the legitimate ones, most of those very likely come from spambots running on machines that have been infected. The owner of those machines have a lot more serious problems than deleting my challenge to them. In fact, it may tip them off to the fact that they are infected. And of the few third party legitimate emailers who get my unwanted challenges, I apologize. But that is still a tiny, tiny fraction of the total spam in the world. I'll gladly stop when there are better solutions. And I won't get mad if I occasionally get unwanted challenges from others (which I do, and which is a tiny, tiny minority of the total spam I receive).

Concern #2. If a challenge-response person emails me, then both our systems will challenge each other, generating even more email traffic.

So what. We each accept each other's challenges and we're done. We only have to do this once per person. And again, this one-time extra 2 emails is so tiny in the wide world of spam, that it is a totally irrelevant argument.

Concern #3. Challenge-response systems are easy to defeat since all someone has to do is forge the From address as someone that I already trust.

This is the easiest to respond to. Yes, in theory this is true, but in practice, spammers don't know who I trust. And in the past 4 days, I have received 4,491 emails of which 165 have been classified as good. Of those, about 30 were spam, but all of those spam were sent through mailing lists that I trust, not from forged From addresses. This does bring up a legitimate problem which is that popular mailing lists may become targetted as spoofed return addresses. But again, in practice, this has not happened yet. So I'm not going to avoid using a system because it theoretically might not work at some point in the future.

The bottom line is that challenge-response systems are not perfect, and probably won't work as well if everyone uses them. But for now, they work much, much better than anything else short of a human spam deleter (now there's a good business opportunity!). And if they stop working better than alternatives, then I'll switch to whatever works better.