Skip to content

howie6879/ruia

Folders and files

NameName
Last commit message
Last commit date

Latest commit

author
howie6879
Aug 21, 2022
68a4502 · Aug 21, 2022
Oct 29, 2021
Aug 21, 2022
Aug 21, 2022
Dec 29, 2020
Apr 2, 2021
Apr 2, 2021
Aug 20, 2022
Jan 10, 2022
Feb 20, 2019
Feb 20, 2019
Feb 20, 2019
Mar 10, 2019
Feb 20, 2019
Aug 20, 2022
Oct 29, 2021
Jan 10, 2022
Jan 7, 2021
Jan 10, 2022

Repository files navigation

Ruia logo

Ruia

🕸️ Async Python 3.6+ web scraping micro-framework based on asyncio.

⚡ Write less, run faster.

travis codecov PyPI - Python Version PyPI Downloads gitter

Overview

Ruia is an async web scraping micro-framework, written with asyncio and aiohttp, aims to make crawling url as convenient as possible.

Write less, run faster:

Features

  • Easy: Declarative programming
  • Fast: Powered by asyncio
  • Extensible: Middlewares and plugins
  • Powerful: JavaScript support

Installation

# For Linux & Mac
pip install -U ruia[uvloop]

# For Windows
pip install -U ruia

# New features
pip install git+https://github.com/howie6879/ruia

Tutorials

  1. Overview
  2. Installation
  3. Define Data Items
  4. Spider Control
  5. Request & Response
  6. Customize Middleware
  7. Write a Plugins

TODO

  • Cache for debug, to decreasing request limitation, ruia-cache
  • Provide an easy way to debug the script, ruia-shell
  • Distributed crawling/scraping

Contribution

Ruia is still under developing, feel free to open issues and pull requests:

  • Report or fix bugs
  • Require or publish plugins
  • Write or fix documentation
  • Add test cases

!!!Notice: We use black to format the code.

Thanks