Filtering with blacklist and whitelist #19

sjehuda · 2025-04-15T08:11:41Z

Hide blacklisted items, unless they contain whitelisted keywords.

It is good to block advertisements and other unwanted content based on keywords, and URLs.

I do keyword based filtering with a couple of projects of mine.

Reference: https://greasyfork.org/en/scripts/465932-newspaper-syndication-feed-reader

      let blacklisted = false,
          whitelisted = false,
          entry_content = entry.textContent.toLowerCase();

      if (filterBlacklist && keywordsBlacklist) {
        if (filterWhitelist) {
          for (keyword of keywordsWhitelist.split(',')) {
            if (keyword.length && entry_content.includes(keyword)) {
              whitelisted = true;
              break;
            }
          }
        }
        if (!whitelisted) {
          for (keyword of keywordsBlacklist.split(',')) {
            for (let character of [' ', '’', '-', '"', ':', ';', ',', '.']) {
              if (keyword.length && entry_content.includes(' ' + keyword + character)) {
                blacklisted = true;
                break;
              }
            }
          }
        }
      }

This is a Python script from a friend from IRC (probably the Liferea channel), to hide items by URL (i.e. advertisements).

#!/usr/bin/env python

from sys import stdin, stdout
from xml.etree.ElementTree import ElementTree
import xml


tree = ElementTree()
xml.etree.ElementTree.register_namespace("","http://www.w3.org/2005/Atom")
tree.parse(stdin)

keywordID = ['more']
root = tree.getroot()

for node in tree.findall('*'):
    ch = node.findall('*')
    if ch:
        for keyword in keywordID:
            if keyword in ch[0].text:
                root.remove(node)
                break
tree.write(stdout, encoding='UTF-8')

And also

#!/usr/bin/env python2

from sys import argv, stdin, stdout
import sys
reload(sys).setdefaultencoding("utf-8")

data = stdin.read()

try: 
	from nbxmpp import simplexml
except ImportError:
	try:
		from xmpp import simplexml
	except ImportError:
		print "simplexml not found"

xml = simplexml.XML2Node(data)

filterTags = xml.getTags("entry")
for tag in filterTags:
	filterData = tag.getTag("id").getData()
	if "marketing" in filterData:
		#tag.setTag("id", "")
		xml.delChild(tag)

sys.stdout.write(xml.__str__(1)); sys.stdout.flush()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filtering with blacklist and whitelist #19

Filtering with blacklist and whitelist #19

sjehuda commented Apr 15, 2025 •

edited

Loading

Filtering with blacklist and whitelist #19

Filtering with blacklist and whitelist #19

Comments

sjehuda commented Apr 15, 2025 • edited Loading

sjehuda commented Apr 15, 2025 •

edited

Loading