<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Janeks random writings &#187; spam</title>
	<atom:link href="http://www.hellqvist.com/janek/weblog/tag/spam/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.hellqvist.com/janek/weblog</link>
	<description>No, it's not another diary.</description>
	<lastBuildDate>Fri, 22 Jan 2010 23:45:22 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Filtering guestbook spam</title>
		<link>http://www.hellqvist.com/janek/weblog/2006/12/16/filtering-guestbook-spam/</link>
		<comments>http://www.hellqvist.com/janek/weblog/2006/12/16/filtering-guestbook-spam/#comments</comments>
		<pubDate>Sat, 16 Dec 2006 11:02:48 +0000</pubDate>
		<dc:creator>Janek</dc:creator>
				<category><![CDATA[Web development]]></category>
		<category><![CDATA[spam]]></category>

		<guid isPermaLink="false">http://www.hellqvist.com/janek/weblog/2006/12/16/filtering-guestbook-spam/</guid>
		<description><![CDATA[Guestbooks, as well as all public forms on the WWW, are constantly abused nowadays by spam bots trying to fill them with spam links. In the guestbook for my band, I had a ratio of 1/10 &#8211; about 9 spams for every valid post. Here are the spam filtering techniques I implemented that worked for [...]]]></description>
			<content:encoded><![CDATA[<p>Guestbooks, as well as all public forms on the WWW, are constantly abused nowadays by spam bots trying to fill them with spam links. In the <a href="http://www.mindslip.net/interact.php">guestbook</a> for my <a href="http://www.mindslip.net/">band</a>, I had a ratio of 1/10 &#8211; about 9 spams for every valid post. Here are the spam filtering techniques I implemented that worked for me:</p>
<ul>
<li>First, a <strong>black list</strong>. I have an array with bad words (viagria, cialis, roulette, casino) which are common in spam posts and quite uncommon in valid guestbook entries. Every post is checked against the black list. The black list also contains a few URL:s that are often used by spammers (blogspot.com and hometown.aol.com) as well as the tag [url= which is probably BBcode.</li>
</ul>
<ul>
<li><strong>Link spamming</strong>. A quick check is made on the number of http:// links, if it’s above 5 &#8211; well, it’s most probably a spam!</li>
</ul>
<ul>
<li><strong>Cookie test</strong>. This is something I picked up on a mailing list a few weeks ago, and it works brilliantly. The idea is that spam bots aren’t valid browsers, they just look for forms and submit them. The trick is to set a cookie when the form is displayed, and then check for the existance of that cookie when the form is received, before it’s saved. If there’s no cookie &#8211; the post isn’t saved! This stops people with cookies disabled from entering posts, but it stops a LOT of spam.</li>
</ul>
<p>BTW, here&#8217;s my entire black list as of today:<br />
<em>levitra,viagra,cialis,porn,roulette,casino,hometown.aol.com,blogspot.com,[url= </em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.hellqvist.com/janek/weblog/2006/12/16/filtering-guestbook-spam/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
