Custom Spam Filters for complaints

January 22, 2013

To help identify spam signals from webmail complaint feed, we can utilize some custom spam filtering using a few small Perl libs to quickly put together simple and custom rules for classifying and grouping the complaint feed into a usable signal.

#!/usr/bin/perl
use strict;

use lib qw{ /home/abuse/lib /abuse/AUP/lib };
use HE::Abuse::Spam::Rule;
use Data::Dumper;

sub evaluate_rules {
   my @rule_files = glob '*.rule';
   my @rules;

   local $/;

   for ( @rule_files ){
      do {
         open my $fd, "<$_";
         next unless $fd;

         my $r = eval <$fd>;
         push( @rules, $r );
      };
   }
   return \@rules;
}

@ARGV = grep { -e $_ } @ARGV;

my $rules = evaluate_rules;

for( @ARGV ){
   open my $fd, "<$_";

   unless( $fd ){
       warn "Could not open $_: $!";
       next;
   }

   my $message = do { local $/; <$fd>; };
   close $fd;

   my $report = Email::ARF::Report->new( $message );

   my $email;

   my @report_parts = $report->as_email->parts;

   if( scalar @report_parts >= 3 ){
       $email = $report_parts[2];
   } else {
       die "Could not parse email: $!"
   }

   print "Evaluating [$_]\n";
   for my $rule ( @$rules ){
       $rule->evaluate( $email );
       print join( ", ", @{$email->{notes}} ), "\n";
   }
}

You’ll see that a simple rule file can be banged out like this:

# This is the Phone number spam rule
HE::Abuse::Spam::Rule->new(
    rule => {
        body => [ 'SENDING YOUR PHONE NUMBER', ],
    },
    rulename => 'PHONE_NUM_SPAM',
    #cb => \&phone_number_spam
);

# vim: filetype=perl syntax ts=4 expandtab tabstop=4 shiftwidth=4 autoindent smartindent nu:

Discussion, links, and tweets

Follow me on Twitter