Site News

Just Site News: RSS ATOM

Technology Posts

Comment spam: wake up and smell the Hashcash coffee

22 May 2007 - 18:29

This blog (and other's on exaflop.org) use weblog software called Pivot. A number of versions back, Pivot implemented a system called hashcash to defeat the comment spambots that are the scourge of the bloggosphere.

Hashcash is a nothing more than a neat trick involving a bit of Javascript that runs on the commenter's browser when they try to submit a comment. If they are running a web browser with javascript enabled (as nearly everyone is), they will just see a normal comment form and the comment will be accepted and displayed on the page immediately, the same as it was in the good old days before comment spam.

Lets compare hashcash to the alternatives:-

  • Keyword blocking: this as ineffective in blogging as it was in the email world before it. It'll catch many spams, but it will also let many through. The blog owner has to keep looking through the spams and adding more and more keywords to the block list. Eventually you have to stop adding keywords or nobody will be able to add anything to the blog!
  • IP blocking: this is ineffective as well because spamming is typically performed by zombie botnets (arrays of PCs that are infected with malware that follow the instructions of remote users while appearing to their owners to work normally). The spams appear from all manner of different IP addresses and besides, you still need to delete all the spams by hand with this method.
  • CAPTCHAs - Completely Automated Public Turing Test to tell Computers and Humans Apart. These show the commenter an image that contains a string of letters and numbers. The user has to type this string input a field on the comment form along with their other details. There are two problems with CAPTCHAs. The first is that they are a usability nightmare - nobody likes having to pass a test like this every time they submit a form and for people with sight problems it can be impossible to pass. The second problem is that OCR (optical character recognition) techniques can used to defeat the test. As a result of 2, CAPTCHAs have been made progressively more and more difficult to pass, exacerbating problem 1.
  • Bayesian filtering: This is the same as the most popular method of email filtering. A mathematical analysis is made to try and recognise if the comment looks like a spam in the same sort of way that we would recognise many spams as spam without even reading them - we recognise many different signs such as strings of garbage characters, implausible names and other features. This works about as well as it does with email spam, i.e. it produces some false positives (real comments tagged as spam) and false negatives (spams that get through).

As you can see from the above list, two of the four main alternative approaches are completely useless and the other two have serious problems of usability and effectiveness. Hashcash by comparison has been completely effective. That needs more emphasis really. In a year I have had less than 10 spam comments appear on my blog. That is few enough for me to believe that the spams that did get through were entered manually. That is something I can live with.

Akismet is a Bayesian filter system used by many bloggers including those on Wordpress. It uses a 'hive mind' approach that combines spam data from many users to improve effectiveness. Even so, lately I am seeing several bloggers (most notably Robert Scoble) complaining about Akismet either not filtering out all the spam, or catching too many genuine comments in it's filter. Apparently there is (or at least has been in the past) a Hashcash plugin for Wordpress. I would strongly suggest people check out this option. Akismet is a nice idea, but it is clearly not as effective as it should be. Hashcash is effective. I don't doubt that Scoble gets more attempts on his blog than I do on mine, but the results should scale because 100% effective scales.

The funny thing is though, when even you mention to Hashcash to bloggers, particularly those that are developers, they completely dismiss it. The main two complaints I hear are: (1) Spammers will eventually develop spam bots that can execute javascript and thus defeat hashcash; (2) It locks out people that aren't running Javascript in their browser, including those with disabilities that use text-only browsers.

I believe 2 isn't a major issue in that you can put a message on your comment form to the effect that if you are unable to use the form, please send an email and you'll add the comment for them. Maybe that will put some people off, but it shouldn't and it's certainly not deal-breaker for Hashcash. Besides, many people are now using conventional Javascript browsers with screen reader software instead of the old text mode browsers like Lynx, so this problem should diminish for disabled users. As for the tin-foil-hat-wearers that disable javascript in their browsers out of paranoia, they can stay silent for all I care :).

The final issue then is that spammers will someday build spambots that can defeat Hashcash. This is a completely bogus reason not to use Hashcash on your blog now. It is possible that one day hashcash will not be enoughto stop spambots. But at the moment the picture is far better for those using hashcash than it is for those relying on CAPTCHAs and Bayesian filtering. Make hay while the sun shines I say!

/ three comments / §

Racing Blog Logo

Calendar

« September 2010
S M T W T F S
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30    
Powered by Pivot - 1.40.1: 'Dreadwind' 
XML: RSS Feed 
XML: Atom Feed