Monday, February 12, 2007

Processing procmail logs in bash

Well, the comments in the script should be self explanatory. It was awesome fun writing this! See if you `like' it ;)



#!/bin/bash
#
# Date: Monday 12th February 2007
#
# Processing procmail logs.
# It copies the original procmail log file to a
# temporary file called procmail after some grepping.
# After that, another temporary file procswapped is
# created and processed. Both files can be safely
# deleted afterwards.
#
# Only one drawback, it requires grep 2.5 or above.

# This is the input:
# From theobald@bar-plate.com Sun Feb 11 17:04:36 2007
# Subject: [SPAM] [+++++++++++++++] Re:
# Folder: /home/tnowak/Maildir/.Spam/new/_DzB.U6zzFB.dingo 13416
# From geuzzld@eriksbikeshop.com Sun Feb 11 17:30:40 2007
# Subject: [SPAM] [+++++++++++++] Still waiting
# Folder: /home/tnowak/Maildir/.Spam/new/_SFC.wS0zFB.dingo 30131
# From commercialtalk.com@esoleau.com Sun Feb 11 17:49:00 2007
# Subject: [SPAM] [+++++++++++] She will love you more than any other guy
# Folder: /home/tnowak/Maildir/.Spam/new/_5QC.8j0zFB.dingo 2309

# TomekN had run this much:

grep -B1 "\[SPAM\]" .procmail.log | grep -v "\--" > procmail

# Here's the procmail file at this stage:
#
# From trutawan056@yahoo.com Sat Jan 27 22:06:05 2007
# Subject: [SPAM] [+++++++++++++++] =?windows-874?B?odLDqNG0t9PhvLm608PYp8PRocnS
# From aw-confirm@ebay.com Sat Jan 27 22:39:56 2007
# Subject: [SPAM] [+++++++] You're a Silver PowerSeller Now!
# From 863kurtis@lightningdezignz.com.au Sun Jan 28 01:56:00 2007
# Subject: [SPAM] [+++] Fwd: Too busy to go back to school,{} but need a Un
# From trutawan055@yahoo.com Sun Jan 28 03:47:48 2007
# Subject: [SPAM] [+++++++++++++++++] =?windows-874?B?odLD46rp4rfDyNG+t+wgtdS0te

sed -ne '/^From/{
s/^/ /
h
n
s/^ *//
G
p
}' procmail > procswapped

# Here's the procswapped file at this stage:
#
# Subject: [SPAM] [+++++++++++++++] =?windows-874?B?odLDqNG0t9PhvLm608PYp8PRocnS
# From trutawan056@yahoo.com Sat Jan 27 22:06:05 2007
# Subject: [SPAM] [+++++++] You're a Silver PowerSeller Now!
# From aw-confirm@ebay.com Sat Jan 27 22:39:56 2007
# Subject: [SPAM] [+++] Fwd: Too busy to go back to school,{} but need a Un
# From 863kurtis@lightningdezignz.com.au Sun Jan 28 01:56:00 2007
# Subject: [SPAM] [+++++++++++++++++] =?windows-874?B?odLD46rp4rfDyNG+t+wgtdS0te
# From trutawan055@yahoo.com Sun Jan 28 03:47:48 2007

while read line
do
subarray[$i]="$line"
((i++))
done < <(egrep '^Subject' procswapped)

# I've read the Subject lines into an array called subarray
# Now I'll sort them according to the number of '+'s
# I've created a new array called plus, which has only the sorted
# '+' patters, including the [ at the beginning and the ] at the end.
#
# Use sort -r if you want the list to be reversed.

plus=($(for (( i=0;i<${#subarray[@]};i++ )); do echo "${subarray[$i]}" | egrep -o '\[\+*\]'; done | sort | uniq))

# Now the final work. The array plus is formatted into a form sed
# will understand as the address, by escaping the proper characters using
# guess what, sed!

for (( i=0;i<${#plus[@]};i++ ))
do
sed -n "/$(echo ${plus[$i]} | sed -n 's/\[\([^[]*\)\]/\\[\1\\]/p')/{
N
p
}" procswapped
done

# Here's the final output:
#
# Subject: [SPAM] [+++] Fwd: Too busy to go back to school,{} but need a Un
# From 863kurtis@lightningdezignz.com.au Sun Jan 28 01:56:00 2007
# Subject: [SPAM] [+++++++] You're a Silver PowerSeller Now!
# From aw-confirm@ebay.com Sat Jan 27 22:39:56 2007
# Subject: [SPAM] [+++++++++++++++] =?windows-874?B?odLDqNG0t9PhvLm608PYp8PRocnS
# From trutawan056@yahoo.com Sat Jan 27 22:06:05 2007
# Subject: [SPAM] [+++++++++++++++++] =?windows-874?B?odLD46rp4rfDyNG+t+wgtdS0te
# From trutawan055@yahoo.com Sun Jan 28 03:47:48 2007
#
# Sorted, based on the spam level indicated by the number of '+'s.