<rdf:RDF
    xmlns:s='http://snipsnap.org/rdf/snip-schema#'
    xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
    xml:base='http://community.moertel.com/ss/rdf'>
    <s:Snip rdf:about='http://community.moertel.com/ss/rdf#start/2004-04-21/1'
         s:name='start/2004-04-21/1'
         s:cUser='tmoertel'
         s:oUser='tmoertel'
         s:mUser='tmoertel'>
        <s:content>1 Cleaning referrer spam out of SnipSnap {anchor:Cleaning referrer spam out of SnipSnap}&#xA;In an earlier entry I posted some code that I was using, along with Apache Rewrite rules, to block referrer spam.  On occasion, however, some spam referrals do get through, and they end up in SnipSnap&apos;s database of backlinks.  That means the spammy backlinks are likely to end up displayed on the site and give Google juice to the spammers.&#xA;&#xA;To solve that problem, I wrote the following Perl script to clean spammy links from SnipSnap&apos;s database.  To use it, I just export my SnipSnap database as an XML file, run the file through the script (giving the script a regex that matches the spammy URLs), and then import the result back into SnipSnap.  Takes about a minute.&#xA;&#xA;Here&apos;s the script:&#xA;&#xA;{code:none}&#xA;\#!/usr/bin/perl&#xA;&#xA;use warnings;&#xA;use strict;&#xA;&#xA;my $target_pattern = shift;&#xA;&#xA;unless ($target_pattern) {&#xA;    require File::Basename;&#xA;    my $cmd = File::Basename::basename($0);&#xA;    print STDERR &quot;Usage: $cmd target-regexp [Input.snip]\\\\n&quot;;&#xA;    exit 1;&#xA;}&#xA;&#xA;undef $/;  # slurp mode&#xA;my $snip = &lt;&gt;;&#xA;&#xA;$snip =~ s{(?&lt;=&lt;backLinks&gt;)([^&lt;]+)}{scrub($1)}ges;&#xA;&#xA;print $snip;&#xA;&#xA;&#xA;sub scrub {&#xA;    join &apos;|&apos;, grep {!/$target_pattern/o} split /\\\\|/, $_[0];&#xA;}&#xA;&#xA;=head1 NAME&#xA;&#xA;clean-snipsnap-backlinks.pl&#xA;&#xA;=head1 SYNOPSIS&#xA;&#xA;B&lt;clean-snipsnap-backlinks.pl&gt; I&lt;target-regex&gt; I&lt;SnipSnapDb.snip&gt;&#xA;E&lt;gt&gt; I&lt;out.snip&gt;&#xA;&#xA;=head1 DESCRIPTION&#xA;&#xA;Removes backlinks that match the I&lt;target-regex&gt; from the input&#xA;SnipSnap database (in Snip format) and prints the cleaned-up database&#xA;to standard output.&#xA;&#xA;This filter is useful for removing spam and porn backlinks that&#xA;spammers create via web crawlers that provide bogus &quot;referer&quot;&#xA;information in HTTP requests.&#xA;&#xA;=head1 LICENSE&#xA;&#xA;This program is free software; you can redistribute it and/or modify&#xA;it under the terms of the GNU General Public License as published by&#xA;the Free Software Foundation; either version 2 of the License, or&#xA;(at your option) any later version.&#xA; &#xA;This program is distributed in the hope that it will be useful,&#xA;but WITHOUT ANY WARRANTY; without even the implied warranty of&#xA;MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the&#xA;GNU General Public License for more details.&#xA;&#xA;&#xA;=head1 AUTHOR&#xA;&#xA;Tom Moertel&#xA;http:\//community.moertel.com/&#xA;2004-04-11&#xA;{code}&#xA;</s:content>
        <s:mTime>2004-05-11 13:28:39.189</s:mTime>
        <s:cTime>2004-04-21 02:21:22.029</s:cTime>
        <s:comments
             rdf:type='http://www.w3.org/1999/02/22-rdf-syntax-ns#Bag'/>
        <s:snipLinks>
            <rdf:Bag>
                <rdf:li rdf:resource='#snipsnap-index'/>
                <rdf:li rdf:resource='#tmoertel'/>
                <rdf:li rdf:resource='http://community.moertel.com/ss/rdf#space/start/2004-04-21/1'/>
                <rdf:li rdf:resource='#snipsnap-search'/>
                <rdf:li rdf:resource='#Code'/>
                <rdf:li rdf:resource='http://community.moertel.com/ss/rdf#TR Joining Postscript and PDF files'/>
                <rdf:li rdf:resource='http://community.moertel.com/ss/rdf#TR+Postscript+to+PNG'/>
                <rdf:li rdf:resource='http://community.moertel.com/ss/rdf#start/2004-06-18/1'/>
                <rdf:li rdf:resource='http://community.moertel.com/ss/rdf#start/2004-06-18/2'/>
                <rdf:li rdf:resource='http://community.moertel.com/ss/rdf#start/2004-12-22/1'/>
                <rdf:li rdf:resource='http://community.moertel.com/ss/rdf#'/>
            </rdf:Bag>
        </s:snipLinks>
        <s:attachments
             rdf:type='http://www.w3.org/1999/02/22-rdf-syntax-ns#Bag'/>
    </s:Snip>
</rdf:RDF>
