Comment spam: wake up and smell the Hashcash coffee
This blog (and other's on exaflop.org) use weblog software called Pivot. A number of versions back, Pivot implemented a system called hashcash to defeat the comment spambots that are the scourge of the bloggosphere.
Lets compare hashcash to the alternatives:-
- Keyword blocking: this as ineffective in blogging as it was in the email world before it. It'll catch many spams, but it will also let many through. The blog owner has to keep looking through the spams and adding more and more keywords to the block list. Eventually you have to stop adding keywords or nobody will be able to add anything to the blog!
- IP blocking: this is ineffective as well because spamming is typically performed by zombie botnets (arrays of PCs that are infected with malware that follow the instructions of remote users while appearing to their owners to work normally). The spams appear from all manner of different IP addresses and besides, you still need to delete all the spams by hand with this method.
- CAPTCHAs - Completely Automated Public Turing Test to tell Computers and Humans Apart. These show the commenter an image that contains a string of letters and numbers. The user has to type this string input a field on the comment form along with their other details. There are two problems with CAPTCHAs. The first is that they are a usability nightmare - nobody likes having to pass a test like this every time they submit a form and for people with sight problems it can be impossible to pass. The second problem is that OCR (optical character recognition) techniques can used to defeat the test. As a result of 2, CAPTCHAs have been made progressively more and more difficult to pass, exacerbating problem 1.
- Bayesian filtering: This is the same as the most popular method of email filtering. A mathematical analysis is made to try and recognise if the comment looks like a spam in the same sort of way that we would recognise many spams as spam without even reading them - we recognise many different signs such as strings of garbage characters, implausible names and other features. This works about as well as it does with email spam, i.e. it produces some false positives (real comments tagged as spam) and false negatives (spams that get through).
As you can see from the above list, two of the four main alternative approaches are completely useless and the other two have serious problems of usability and effectiveness. Hashcash by comparison has been completely effective. That needs more emphasis really. In a year I have had less than 10 spam comments appear on my blog. That is few enough for me to believe that the spams that did get through were entered manually. That is something I can live with.
Akismet is a Bayesian filter system used by many bloggers including those on Wordpress. It uses a 'hive mind' approach that combines spam data from many users to improve effectiveness. Even so, lately I am seeing several bloggers (most notably Robert Scoble) complaining about Akismet either not filtering out all the spam, or catching too many genuine comments in it's filter. Apparently there is (or at least has been in the past) a Hashcash plugin for Wordpress. I would strongly suggest people check out this option. Akismet is a nice idea, but it is clearly not as effective as it should be. Hashcash is effective. I don't doubt that Scoble gets more attempts on his blog than I do on mine, but the results should scale because 100% effective scales.
The final issue then is that spammers will someday build spambots that can defeat Hashcash. This is a completely bogus reason not to use Hashcash on your blog now. It is possible that one day hashcash will not be enoughto stop spambots. But at the moment the picture is far better for those using hashcash than it is for those relying on CAPTCHAs and Bayesian filtering. Make hay while the sun shines I say!
You are correct. You may as well use what works, at least for now. But, you must agree that implementation takes time and someone has to do it. What we need is a hurdle that can’t be jumped. And there is such a hurdle in development.
There are at least 50 posts like yours each day regarding the sickening about of time wasted daily on irrelevant junk! And I, as you might have guessed, completely agree! I building a collaborative team to complete the execution and deployment of system that will ultimately leave the spammer powerless. Laws are not going to work and filters of any sort can never keep up. So, are you interested?
You can reach me here: http://www.respect101.com
Frank (URL) - 25 05 07 - 08:12
Could you please hide my email address on the last post!
The Hide email address on your form below should be default checked. AND the selections should be above the submit button.
Frank (URL) - 25 05 07 - 08:16
Hi Frank, interesting blog. Spam is a complex subject and it’s not something I intend to spend too much time on. My main point in writing this post was to point out to a skeptical blog community that hashcash does still remain an effective solution to prevent comment spam. And to point out my amazement that people had discounted it on the basis that it might not work at some point in the future, especially when their more complex and expensive solutions are becoming less and less effective all the time anyway.
Sorry about the email address – I’ve removed your email address from the comment where it wasn’t hidden. The comment forms on my blog came from the default Pivot templates and I hadn’t seen a need to redesign them, but I will now you’ve pointed out the pretty significant flaw! Good luck with fighting the spammers.
rich () (URL) - 27 05 07 - 09:41
Noticed your site was down for a couple of days. Had a bit of a heart attack. You have no idea how important this site is to the careers of its visitors (especially the CGA FAQ pages). Would love to donate my two cents for your efforts. Let me know how.
Vulcan Eager - 21 10 11 - 06:30