State-of-the-art web crawler 🔱
Zeno is a web crawler designed to operate wide crawls or to simply archive one web page. Zeno's key concepts are: portability, performance, simplicity. With an emphasis on performance.
It heavily relies on the warc module for traffic recording into WARC files.
The name Zeno comes from Zenodotus (Ζηνόδοτος), a Greek grammarian, literary critic, Homeric scholar, and the first librarian of the Library of Alexandria.
go install github.com/internetarchive/Zeno@latest
To archive a single web page:
Zeno get url https://www.france.fr
Zeno is highly configurable with many parameters that can be customized. To see all available configuration options, use Zeno -h
and/or Zeno get -h
.
Contributions are welcome! Please feel free to submit a Pull Request & open issues!
Zeno is being developed and maintained by Corentin Barreau at the Internet Archive. The project has evolved into what it is today thanks to the invaluable contributions from the community. While we can't list everyone, special thanks to:
- Jake LaFountain, Wayback Machine Software Engineer at the Internet Archive.
- Thomas Foubert, Wayback Machine Platform Engineer at the Internet Archive.
- yzqzss, Lead Developer of the Save The Web Project.
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). See the LICENSE file for details.