Skip to main content

Mailing Lists and Procmail

Published: June 11, 2011

I like having procmail sort my mail for me. In the case of mailing lists, the header of choice is the List-ID field. But there's a problem... notice how each example below is slightly different. I want to pull the first portion of the mailing list name (e.g. linux-kernel, kernelnewbies, cocci) and use that as the folder name:

List-ID: <linux-kernel.vger.kernel.org>
List-Id: Learn about the Linux kernel <kernelnewbies.kernelnewbies.org>
List-Id: cocci.diku.dk

To get started, let's state in English what we want to find: Dear procmail; please find the word that immediately precedes the first period in a line that begins with "List-Id:"

Finding these headers is easy with a regular expression... IF... you're allowed to use look ahead:

^List-Id:.*?( (?!.*<)|<)([^.]*)

BUT, procmail doesn't do look ahead :( So let's try with procmail's regular expressions. Aside from look ahead/behind, there are two other major differences between procmail's regular expressions and the rest of the world. First, procmail uses \/ to mark the portion of the expression that will be copied into $MATCH. Secondly, the part of the regular expression to the left of the \/ uses non-greedy matching. So when you write . procmail treats it like .? this is the feature that makes matching the three list headers I want to grab quite difficult.

With this in mind:

Match the linux-kernel list:

^List-Id: *<\/[^.]*

Match linux-kernel and kernelnewbies

note the [^<]? ^List-Id: .<\/[^<]?[^.]

Notice the extra [^<]? which tells procmail that we want $MATCH to start after the < character. This is what allows the rule to find kernelnewbies without pulling < into $MATCH. This is necessary because procmail isn't being greedy when it matches to the left side of \/.

Now, our remaining problem is the cocci mailing list. This one really makes life difficult. I decided that using a single regular expression just isn't possible, so that means we'll need two. One to grab the cocci mailing list and one to grab everything else. Here's the completed procmail rule (note: I use Maildir and not mbox on my mailserver).

:0
* ^List-Id: \/[^.]+
{
    #list with <>
    #e.g. List-Id: Learn about the Linux kernel <kernelnewbies.kernelnewbies.org>
    #e.g. List-Id: <linux-kernel.vger.kernel.org>
    :0
    * $MATCH ?? ^.*<\/[^<]+
    .MailingLists.$MATCH/

    #list without <>
    #e.g. List-Id: cocci.diku.dk
    :0
    .MailingLists.$MATCH/
}