Skip to content

internetarchive/Zeno

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zeno

State-of-the-art web crawler 🔱

Introduction

Zeno is a web crawler designed to operate wide crawls or to simply archive one web page. Zeno's key concepts are: portability, performance, simplicity. With an emphasis on performance.

It heavily relies on the warc module for traffic recording into WARC files.

The name Zeno comes from Zenodotus (Ζηνόδοτος), a Greek grammarian, literary critic, Homeric scholar, and the first librarian of the Library of Alexandria.

Installation

go install github.com/internetarchive/Zeno@latest

Quick Start

To archive a single web page:

Zeno get url https://www.france.fr

Zeno is highly configurable with many parameters that can be customized. To see all available configuration options, use Zeno -h and/or Zeno get -h.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request & open issues!

Zeno is being developed and maintained by Corentin Barreau at the Internet Archive. The project has evolved into what it is today thanks to the invaluable contributions from the community. While we can't list everyone, special thanks to:

License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). See the LICENSE file for details.