« Samsung YP-Z5 portabl… | Home | iPhone development »

Comment spam: wake up and smell the Hashcash coffee

This blog (and other's on exaflop.org) use weblog software called Pivot. A number of versions back, Pivot implemented a system called hashcash to defeat the comment spambots that are the scourge of the bloggosphere.

Hashcash is a nothing more than a neat trick involving a bit of Javascript that runs on the commenter's browser when they try to submit a comment. If they are running a web browser with javascript enabled (as nearly everyone is), they will just see a normal comment form and the comment will be accepted and displayed on the page immediately, the same as it was in the good old days before comment spam.

Lets compare hashcash to the alternatives:-

  • Keyword blocking: this as ineffective in blogging as it was in the email world before it. It'll catch many spams, but it will also let many through. The blog owner has to keep looking through the spams and adding more and more keywords to the block list. Eventually you have to stop adding keywords or nobody will be able to add anything to the blog!
  • IP blocking: this is ineffective as well because spamming is typically performed by zombie botnets (arrays of PCs that are infected with malware that follow the instructions of remote users while appearing to their owners to work normally). The spams appear from all manner of different IP addresses and besides, you still need to delete all the spams by hand with this method.
  • CAPTCHAs - Completely Automated Public Turing Test to tell Computers and Humans Apart. These show the commenter an image that contains a string of letters and numbers. The user has to type this string input a field on the comment form along with their other details. There are two problems with CAPTCHAs. The first is that they are a usability nightmare - nobody likes having to pass a test like this every time they submit a form and for people with sight problems it can be impossible to pass. The second problem is that OCR (optical character recognition) techniques can used to defeat the test. As a result of 2, CAPTCHAs have been made progressively more and more difficult to pass, exacerbating problem 1.
  • Bayesian filtering: This is the same as the most popular method of email filtering. A mathematical analysis is made to try and recognise if the comment looks like a spam in the same sort of way that we would recognise many spams as spam without even reading them - we recognise many different signs such as strings of garbage characters, implausible names and other features. This works about as well as it does with email spam, i.e. it produces some false positives (real comments tagged as spam) and false negatives (spams that get through).

As you can see from the above list, two of the four main alternative approaches are completely useless and the other two have serious problems of usability and effectiveness. Hashcash by comparison has been completely effective. That needs more emphasis really. In a year I have had less than 10 spam comments appear on my blog. That is few enough for me to believe that the spams that did get through were entered manually. That is something I can live with.

Akismet is a Bayesian filter system used by many bloggers including those on Wordpress. It uses a 'hive mind' approach that combines spam data from many users to improve effectiveness. Even so, lately I am seeing several bloggers (most notably Robert Scoble) complaining about Akismet either not filtering out all the spam, or catching too many genuine comments in it's filter. Apparently there is (or at least has been in the past) a Hashcash plugin for Wordpress. I would strongly suggest people check out this option. Akismet is a nice idea, but it is clearly not as effective as it should be. Hashcash is effective. I don't doubt that Scoble gets more attempts on his blog than I do on mine, but the results should scale because 100% effective scales.

The funny thing is though, when even you mention to Hashcash to bloggers, particularly those that are developers, they completely dismiss it. The main two complaints I hear are: (1) Spammers will eventually develop spam bots that can execute javascript and thus defeat hashcash; (2) It locks out people that aren't running Javascript in their browser, including those with disabilities that use text-only browsers.

I believe 2 isn't a major issue in that you can put a message on your comment form to the effect that if you are unable to use the form, please send an email and you'll add the comment for them. Maybe that will put some people off, but it shouldn't and it's certainly not deal-breaker for Hashcash. Besides, many people are now using conventional Javascript browsers with screen reader software instead of the old text mode browsers like Lynx, so this problem should diminish for disabled users. As for the tin-foil-hat-wearers that disable javascript in their browsers out of paranoia, they can stay silent for all I care :).

The final issue then is that spammers will someday build spambots that can defeat Hashcash. This is a completely bogus reason not to use Hashcash on your blog now. It is possible that one day hashcash will not be enoughto stop spambots. But at the moment the picture is far better for those using hashcash than it is for those relying on CAPTCHAs and Bayesian filtering. Make hay while the sun shines I say!


four comments:

You are correct. You may as well use what works, at least for now. But, you must agree that implementation takes time and someone has to do it. What we need is a hurdle that can’t be jumped. And there is such a hurdle in development.

There are at least 50 posts like yours each day regarding the sickening about of time wasted daily on irrelevant junk! And I, as you might have guessed, completely agree! I building a collaborative team to complete the execution and deployment of system that will ultimately leave the spammer powerless. Laws are not going to work and filters of any sort can never keep up. So, are you interested?

You can reach me here: http://www.respect101.com

Best
Frank
Frank (URL) - 25 05 07 - 08:12

Could you please hide my email address on the last post!

The Hide email address on your form below should be default checked. AND the selections should be above the submit button.
Frank (URL) - 25 05 07 - 08:16

Hi Frank, interesting blog. Spam is a complex subject and it’s not something I intend to spend too much time on. My main point in writing this post was to point out to a skeptical blog community that hashcash does still remain an effective solution to prevent comment spam. And to point out my amazement that people had discounted it on the basis that it might not work at some point in the future, especially when their more complex and expensive solutions are becoming less and less effective all the time anyway.

Sorry about the email address – I’ve removed your email address from the comment where it wasn’t hidden. The comment forms on my blog came from the default Pivot templates and I hadn’t seen a need to redesign them, but I will now you’ve pointed out the pretty significant flaw! Good luck with fighting the spammers.
rich () (URL) - 27 05 07 - 09:41

Hi,

Noticed your site was down for a couple of days. Had a bit of a heart attack. You have no idea how important this site is to the careers of its visitors (especially the CGA FAQ pages). Would love to donate my two cents for your efforts. Let me know how.
Vulcan Eager - 21 10 11 - 06:30


No trackbacks:

Trackback link:

Please enable javascript to generate a trackback url



  
Remember personal info?

/ Textile

  ( Logged in as )

Notify:
Hide email:

Small print: All html tags except <b> and <i> will be removed from your comment. You can make links by just typing the url or mail-address.

Calendar

« October 2012 »
S M T W T F S
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      
Powered by Pivot - 1.40.1: 'Dreadwind' 
XML: RSS Feed 
XML: Atom Feed