Skip to content

neuro-rights/atac

Repository files navigation

Activism Tools Against Cybertorture

GitHub

python-app License: MIT Upload Python Package

Q.A. Metrics

Codacy

Codacy Badge Codacy Badge

DeepSource

DeepSource DeepSource

Summary

The current scraping algorithm is the following
P ← starting URLs (primary queue)
S ← ∅ (secondary queue)
V ← ∅ (visited pages)
while P ≠ ∅ do
    Pick a page v from P and download it
    V ← V ∪ {v} (mark as visited)
    N+(v) ← v’s out-links pointing to new pages (“new” means not in P, S or V)
    if |N+(v)| > t then
        R ← first t out-links N+(v)
        S ← S ∪ (v)
    end if
    P ← P ∪ R
    if P = ∅ then
        P ← S
        S ← ∅
    end if

it is based on https://chato.cl/papers/castillo_06_controlling_queue_size_web_crawling.pdf

Usage

Create a virtual environment

python3 -m venv env

Activate virtual environment POSIX (bash)

source <venv>/bin/activate

or

Activate virtual environment Windows (Powershell)

PS C:\> <venv>\Scripts\Activate.ps1

Install dependencies

pip3 install -r requirements.txt

Create contacts

python3 atac.py scrape -u url_to_scrape

Send Email to contacts created above

python3 atac.py email -p path_to_csv -m path_to_message -s subject

Send IRC

python3 atac.py irc

About

No description, website, or topics provided.

Resources

Security policy

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

No packages published

Languages