v2.0.0 #1

niftylettuce · 2020-04-24T01:59:45Z

When we're parsing tokens, striptags implementation needs to additionally be pre-processed with sanitize-html to remove blocks like <style>, <stylesheet>, <meta>, <head> etc.
Modify scanner.getPhishingResults to check against ~~OpenPhish~~ and PhishTank datasets.
Tokenize and stem other mail headers (e.g. to, from, cc, bcc, reply-to, in-reply-to, etc.)
Determine solution to performance issue with classifier.train() in classifier.js per NaturalNode/natural#520.
Headers should NOT get converted and preserved for URL/Received-By purposes - only content should be converted
Get inspiration from ls /usr/share/spamassassin if needed

The text was updated successfully, but these errors were encountered:

niftylettuce · 2020-05-04T13:24:31Z

niftylettuce · 2020-05-08T23:53:37Z

Phishing protection is too strict (e.g. sendgrid link tracker/click trackers won't work)

niftylettuce · 2020-05-29T04:23:49Z

niftylettuce · 2020-06-18T08:44:58Z

niftylettuce · 2020-06-18T08:46:37Z

When ARF parses message, strip out the replacement tokens to get a pure content-only tokens array
We may want to do PG approach of looking at (n) most interesting words
Gibberish detection with Wikimedia and Google AI datasets

niftylettuce · 2020-06-18T21:46:08Z

niftylettuce added the help wanted Extra attention is needed label Apr 24, 2020

niftylettuce pinned this issue Apr 24, 2020

Provide feedback