Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please document a working regexp example #423

Open
systemcrash opened this issue Nov 7, 2019 · 3 comments
Open

Please document a working regexp example #423

systemcrash opened this issue Nov 7, 2019 · 3 comments

Comments

@systemcrash
Copy link
Contributor

I checked the regexp module page, and could not make a working .conf file.

Specifically, I found this in the code:

reconf['MICROSOFT_SPAM'] = {
  -- https://technet.microsoft.com/en-us/library/dn205071(v=exchg.150).aspx
  re = 'X-Forefront-Antispam-Report=/SFV:SPM/H',
  score = 4.0,
  description = "Microsoft says the message is spam",
  group = 'upstream_spam_filters'
}

And wanted an expression like:

re = 'X-Forefront-Antispam-Report=/SFV:SPM/iH'

But the regexp page speaks only about regexp, and Internal functions, but not how to use them.

Which internal function do we call to say "yup, definitely spam, drop this shit"? Why perform all those binary checks (internal functions) if the regexp itself is the check we need?

Please show an example (and document it) that can go in local.d/regexp.conf - Ideally one that will immediately a) learn spam and reject or b) drop or discard

Today, with milter-regex, the syntax there is clear, e.g.:

discard
header /^X-Microsoft-Antispam$/i /.*BCL\:[1-9]*/i

discard
header /^X-Forefront-Antispam-Report$/i /.*SFV\:SPM.*/i
@jmptbl
Copy link

jmptbl commented Apr 22, 2020

@systemcrash I needed to compile a regexp rule recently, and also struggled to figure out the regexp module. Eventually I got something working. Below is the content of my local.d/regexp.conf file, hope it helps.

"RE_SEXTORTION" = {
	re = '/your/{words} && /password/{words} && /buy/{words} && /bitcoin/{words}';
	score = 15.0;
}

@systemcrash
Copy link
Contributor Author

systemcrash commented Apr 22, 2020 via email

@jmptbl
Copy link

jmptbl commented Apr 22, 2020

It filters on the {words} type, which is a transformation on the message body documented as follows:

Unicode normalized (to NFKC) and lowercased words extracted from the text (excluding URLs), subject and From displayed name

The content was sextortion type emails that I was given as examples. They were sneakily encoded with strange UTF-8 character sequences, so {words} and the regexp patterns I gave seemed good enough given the size and type of the user base in question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants