P ← starting URLs (primary queue)
S ← ∅ (secondary queue)
V ← ∅ (visited pages)
while P ≠ ∅ do
Pick a page v from P and download it
V ← V ∪ {v} (mark as visited)
N+(v) ← v’s out-links pointing to new pages (“new” means not in P, S or V)
if |N+(v)| > t then
R ← first t out-links N+(v)
S ← S ∪ (v)
end if
P ← P ∪ R
if P = ∅ then
P ← S
S ← ∅
end if
it is based on https://chato.cl/papers/castillo_06_controlling_queue_size_web_crawling.pdf
Create a virtual environment
python3 -m venv env
Activate virtual environment POSIX (bash)
source <venv>/bin/activate
or
Activate virtual environment Windows (Powershell)
PS C:\> <venv>\Scripts\Activate.ps1
Install dependencies
pip3 install -r requirements.txt
Create contacts
python3 atac.py scrape -u url_to_scrape
Send Email to contacts created above
python3 atac.py email -p path_to_csv -m path_to_message -s subject
Send IRC
python3 atac.py irc