January 3, 2007

In defense of Challenge-Response spam detection systems

Like a lot of people, I get a lot of spam. And in the past months, it has gotten a lot worse. On average, I get well over 1,000 spam a day, and that is after spamassassin has already processed it and deleted what it detects without me ever even seeing that. Of those that get through, Outlook puts most in the Junk E-email folder, leaving me about 100 a day to delete - intermixed with my 50 or so legitimate emails a day.

In terms of actual time lost, it is not great. Maybe 2 seconds per email to delete or 2*100=~3 minutes plus a couple of minutes to look through the junked email to salvage good ones (of which I typically find about 1 per day) for a total of maybe 5 minutes a day.

But in terms of mind share and detraction, this is huge. It means that I am continuously distracted all day long by the dregs of society - pornography, rampant commercialism, and fraud. This is the worst kind of distraction, not only taking my mind away from my flow of concentration, but doing so in a way that I do my best to avoid in every other aspect of my life, and that I would not even consider letting my 7 year old daughter have access to.

So, after giving up on all the standard solutions to spam, I signed up for SpamArrest, a commercial "challenge-response" spam detection system. This works by requiring everyone that wants to send you email to first follow a link to a website and prove they are human by reading a word in a warped image and typing it (i.e., a CAPTCHA). The reason this approach works is that:
  • Each sender only has to do this once for me. The system remembers that person for the future.
  • I can preload the system with all of my contacts and anyone I've sent email to in the past so that everyone I already communicate with won't have to validate themselves, and won't know I am using this system.
  • New people that send me email have to use this system once, and legitmate senders are usually willing to go through this step.
  • I can authenticate senders unlikely to do this (like various large e-commerce sites).
  • I can let email lists through by setting them up indivdually.
  • Spammers that send me email are almost never willing to go through this step, and so I never see their email. The reason that spammers aren't willing to do that is because they are computer software and can't, or because they are human and don't want to spend the time. In fact, most spam is sent by "spambots" which are other people's computers hijacked for the sole purpose of sending spam. This email is sent with forged email return addresses, so they never even receive the request for validation.

So, if this is such a panacea, why isn't everyone using it? Well for one thing, you have to pay for it (about $3/month). But lots of people think this approach is a bad idea in principle, and have been arguing against it. However, while I agree that it does have problems, it is better than any current alternative, and I'm not going to wait around suffering while I wait for better solutions. So, let me respond to one complaint about challenge-response systems. I'll summarize the complaints and respond here.

Concern #1. Spammers will forge mail to me with someone else's return address thus sending my challenge to the poor forgee's email box.

Looking at the actual spam I receive, the vast, vast majority has false return addresses. And of the legitimate ones, most of those very likely come from spambots running on machines that have been infected. The owner of those machines have a lot more serious problems than deleting my challenge to them. In fact, it may tip them off to the fact that they are infected. And of the few third party legitimate emailers who get my unwanted challenges, I apologize. But that is still a tiny, tiny fraction of the total spam in the world. I'll gladly stop when there are better solutions. And I won't get mad if I occasionally get unwanted challenges from others (which I do, and which is a tiny, tiny minority of the total spam I receive).

Concern #2. If a challenge-response person emails me, then both our systems will challenge each other, generating even more email traffic.

So what. We each accept each other's challenges and we're done. We only have to do this once per person. And again, this one-time extra 2 emails is so tiny in the wide world of spam, that it is a totally irrelevant argument.

Concern #3. Challenge-response systems are easy to defeat since all someone has to do is forge the From address as someone that I already trust.

This is the easiest to respond to. Yes, in theory this is true, but in practice, spammers don't know who I trust. And in the past 4 days, I have received 4,491 emails of which 165 have been classified as good. Of those, about 30 were spam, but all of those spam were sent through mailing lists that I trust, not from forged From addresses. This does bring up a legitimate problem which is that popular mailing lists may become targetted as spoofed return addresses. But again, in practice, this has not happened yet. So I'm not going to avoid using a system because it theoretically might not work at some point in the future.

The bottom line is that challenge-response systems are not perfect, and probably won't work as well if everyone uses them. But for now, they work much, much better than anything else short of a human spam deleter (now there's a good business opportunity!). And if they stop working better than alternatives, then I'll switch to whatever works better.

5 comments:

Agnes said...

Your reply to #1 shows that you don't get the issue. The forged email addresses used in the From: field of spam messages do not "most [...] likely come from spambots running on machines that have been infected". They are very often valid email addresses harvested the same way your address that received the spam messages was (taken from public website/newsgroup, username generator on a popular domain...).

I administer several smallish domains. I started dev'nulling traffic directed to unassigned addresses because over a period of several months I started getting thousands of failed delivery notices sent back to randomly generated addresses on my domain (look up spammers and joe jobbing) by poorly configured SMTP servers, along with a steadily increasing number of challenge requests.

If every broken SMTP server out there gets fixed not to send failed delivery notices, it looks like the job of adding to the collateral damage from spam will be taken over by CR systems, unless they are recognized as the broken premise they are.

Ben Bederson said...

Having used this system for 6 months now, I defend my argument that the vast majority of email comes from fake email addresses.

Over the last 30 days, I received 127,482 pieces of email, 118,143 of which were marked as spam from unknown senders. In practice, I typically now get maybe 5 pieces of spam a day - compared 500-700 that are sent to me.

Boy am I happy I am using this system.

dave said...

I too have found this SPAM ARREST to be a time- if not a life- saver. I've been using for around 2 years now.
I have recently been having a large problem in that the service seems to breakdown nearly monthy for the last few. For instance I realised about an hour ago that I haven't recieved a new email in 6 or 7 hours. Spam Arrest has removed the tel # contact from their sight and now send out a paliative email 24 hours after the fact explaining that their job is hard.

While today is not a good example- when it happend in December it highly impacted my business.

And now their service agreement: which they can change at will, becomes the contract between you and the company simply by them posting the new version(without changes to the contract highlighted in any way)on the aggreement page of their website and which you accept by simply using the service after their post un-announced posting- let's us know that as an acceptable part of the service they can delay and/or loose any number of your emails.

This seems like a low bar. I haven't found a service as good out there yet, is there anyway to aply some preasure on them to step up and be a responsibole service provider?

dave said...

Additional note from Dave-
in their defense Michael from Spam Arrest just emailed me back (about 15mins after my email to them) letting me know that this problem was with Network Solutions, and that they are working with Network Solutions to solve the problem.

Good on them for that, but still I feel helpless under the circumstances.

Spike said...

There is a more serious variant of #2 that depends on the precise implementation by both systems. It is quite likely that 2 Challenge-Response systems trying to talk to each other will get into deadlock. This problem, and the other problems with C-R, would get worse if C-R was more widely adopted.

More information is here:

http://linuxmafia.com/faq/Mail/challenge-response.html

en.wikipedia.org/wiki/Challenge-response_spam_filtering

Spam Arrest in particulary is being dumb and anti-social because it is challenging my work email even though it originates from an address that Spam Arrest can (and does!) verify using SPF. Now SPF is not perfect, but combined with a reputation score on my work company (one of the world's largest corporations) you would think they would not need to assume it was spam.

In conclusion I'm afraid C-R is part of the problem, not part of the solution.