Skip to content

nberlette/dql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

89322fd Β· Jul 12, 2022

History

15 Commits
Jul 9, 2022
Jul 9, 2022
Jul 9, 2022
Jul 12, 2022
Jul 12, 2022
Jul 9, 2022
Jul 9, 2022
Jul 9, 2022
Jul 12, 2022
Jul 10, 2022
Jul 12, 2022
Jul 9, 2022
Jul 12, 2022
Jul 10, 2022

Repository files navigation

Web Scraping with Deno  β€“  DOM + GraphQL


DQL is a web scraping module for Deno and Deno Deploy that integrates the power of GraphQL Queries with the DOM tree of a remote webpage or HTML document fragment. This is a fork of DenoQL with some heavy refactoring and some additional features:

  • Compatibility with the Deno Deploy architecture
  • Ability to pass variables alongside all queries
  • New state-management class with additional methods
  • Modular project structure (as opposed to a mostly single-file design)
  • Improved types and schema structure

Note: This is a work-in-progress and there is still a lot to be done.

πŸš›  Junkyard Scraper


useQuery

The primary function exported by the module is the workhorse named useQuery:

import { useQuery } from "https://deno.land/x/dql/mod.ts";

const data = await useQuery(`query { ... }`);

QueryOptions

You can also provide a QueryOptions object as the second argument of useQuery, to further control the behavior of your query requests. All properties are optional.

const data = await useQuery(`query { ... }`, {
  concurrency: 8, // passed directly to PQueue initializer
  fetch_options: { // passed directly to Fetch API requests
    headers: {
      "Authorization": "Bearer ghp_a5025a80a24defd0a7d06b4fc215bb5635a167c6",
    },
  },
  variables: {}, // variables defined in your queries
  operationName: "", // when using multiple queries
});

createServer

With Deno Deploy, you can deploy DQL with a GraphQL Playground in only 2 lines of code:

import { createServer } from "https://deno.land/x/dql/mod.ts";

createServer(80, { endpoint: "https://dql.deno.dev" });

πŸ› Try the GraphQL Playground at dql.deno.dev
πŸ¦• View the source code in the Deno Playground

Command Line Usage (CLI)

deno run -A --unstable https://deno.land/x/dql/serve.ts

Custom port (default is 8080)

deno run -A https://deno.land/x/dql/serve.ts --port 3000

Warning: you need to have the Deno CLI installed first.


πŸ’» Examples

πŸš› Junkyard Scraper Β· Deno Playground πŸ¦•

import { useQuery } from "https://deno.land/x/dql/mod.ts";
import { serve } from "https://deno.land/std@0.147.0/http/server.ts";

serve(async (res: Request) =>
  await useQuery(
    `
  query Junkyard (
    $url: String
    $itemSelector: String = "table > tbody > tr"
  ) {
    vehicles: page(url: $url) {
      totalCount: count(selector: $itemSelector)
      nodes: queryAll(selector: $itemSelector) {
        id: index
        vin:   text(selector: "td:nth-child(7)", trim: true)
        sku:   text(selector: "td:nth-child(6)", trim: true)
        year:  text(selector: "td:nth-child(1)", trim: true)
        model: text(selector: "td:nth-child(2) > .notranslate", trim: true)
        aisle: text(selector: "td:nth-child(3)", trim: true)
        store: text(selector: "td:nth-child(4)", trim: true)
        color: text(selector: "td:nth-child(5)", trim: true)
        date:  attr(selector: "td:nth-child(8)", name: "data-value")
        image: src(selector: "td > a > img")
      }
    }
  }`,
    {
      variables: {
        "url": "http://nvpap.deno.dev/action=getVehicles&makes=BMW",
      },
    },
  )
    .then((data) => JSON.stringify(data, null, 2))
    .then((json) =>
      new Response(json, {
        headers: { "content-type": "application/json;charset=utf-8" },
      })
    )
);

πŸ“ HackerNews Scraper Β· Deno Playground πŸ¦•

import { useQuery } from "https://deno.land/x/dql/mod.ts";
import { serve } from "https://deno.land/std@0.147.0/http/server.ts";

serve(async (res: Request) =>
  await useQuery(`
  query HackerNews (
    $url: String = "http://news.ycombinator.com"
    $rowSelector: String = "tr.athing"
  ) {
    page(url: $url) {
      title
      totalCount: count(selector: $rowSelector)
      nodes: queryAll(selector: $rowSelector) {
        rank: text(selector: "td span.rank", trim: true)
        title: text(selector: "td.title a", trim: true)
        site: text(selector: "span.sitestr", trim: true)
        url: href(selector: "td.title a")
        attrs: next {
          score: text(selector: "span.score", trim: true)
          user: text(selector: "a.hnuser", trim: true)
          date: attr(selector: "span.age", name: "title")
        }
      }
    }
  }`)
    .then((data) => JSON.stringify(data, null, 2))
    .then((json) =>
      new Response(json, {
        headers: { "content-type": "application/json;charset=utf-8" },
      })
    )
);

License

MIT Β© Nicholas Berlette, based on DenoQL.