Skip to content
This repository was archived by the owner on Nov 14, 2023. It is now read-only.

open-data-plan/crawler

Repository files navigation

crawler

Web crawler based on Puppeteer

node (scoped) npm (scoped) build Build Status Coverage Status

Install

npm install @opd/crawler

Use

import Crawler from '@opd/crawler'
// or commonjs
const Crawler = require('@opd/crawler').default

const crawler = new Crawler(options)

API

new Crawler(options)

create crawler instance

options: crawler instance config

  • parallel: maximum number of crawlers, default is 5
  • pageEvaluate: evaluate function on current page, see Puppeteer, cannot support extra args now

crawler.launch([options])

launch browser use puppeteer.launch

crawler.queue(urls)

add urls to crawler queue

Note: check url strictly, means url must start with https?

crawler.start([urls]): PageResult[]

start crawl page, if urls is presented, will call crawler.queue firstly.

const result = await crawler.start()
console.log(result)

// [
//   {
//     url, // page url
//     result // crawled result
//   }
// ]

Note: if you call start before launch, browser will also be launched, but with no extra launch options

About

Web crawler based on Puppeteer

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •