From 7908859e1903cd6add1c724053cd064e5136608f Mon Sep 17 00:00:00 2001 From: Plutonist Date: Fri, 17 Feb 2017 13:49:54 +0800 Subject: [PATCH] Update example & readme & license --- License | 14 +++++++ Readme.md | 97 ++++++++++++++++++++++++++++++++++++++++++++++++ eh.crr => eh.crs | 0 main/main.go | 2 +- 4 files changed, 112 insertions(+), 1 deletion(-) create mode 100644 License rename eh.crr => eh.crs (100%) diff --git a/License b/License new file mode 100644 index 0000000..53f5889 --- /dev/null +++ b/License @@ -0,0 +1,14 @@ +Copyright (c) 2017 Plutonist +All rights reserved. + +Redistribution and use in source and binary forms are permitted +provided that the above copyright notice and this paragraph are +duplicated in all such forms and that any documentation, +advertising materials, and other materials related to such +distribution and use acknowledge that the software was developed +by the Plutonist. The name of the +Plutonist may not be used to endorse or promote products derived +from this software without specific prior written permission. +THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR +IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED +WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. \ No newline at end of file diff --git a/Readme.md b/Readme.md index 8768ac0..92181d2 100644 --- a/Readme.md +++ b/Readme.md @@ -58,4 +58,101 @@ link: https://reactos.org/project-news/reactos-044-released title: FeFETs: How this new memory stacks up against existing non-volatile memory site: semiengineering.com link: http://semiengineering.com/what-are-fefets/ +``` + +## Script Spec + +### Town + +Town is a lambda like expression for saving (in)mutable string. Most of the time, we used it to store url. + +``` +page(@page=1, ext) = "https://news.ycombinator.com/news?p={@page}&ext={ext}" +``` + +When you need town, use it as if you were calling a function: + +``` +news[]: page(ext="Hello World!") -> $("tr.athing") +``` + +Hey, you might have noticed that the `@page` parameter is not used. Yeah, it is a special parameter. + +Expression in town definition line like `name="something"`, represents parameter `name` has a default value `"something"`. + +Incidentally, `@page` is a parameter that will automatically increasing when current page has no more content. + + +### Node + +Nodes is a tree structure that represents the data structure you are going to crawl. + +``` +news[]: page -> $("tr.athing") + title: $(".title a.storylink").text + site: $(".title span.sitestr").text + link: $(".title a.storylink").href +``` + +Like `yaml`, nodes distinguishes the hierarchy by indentation. + +#### Node Name + +Node has name. `title` is a field name, represents a general string data. `news[]` is a array name, represents a parent structure with multiple sub-data. + +#### Page + +Page indicates where to fetching the field data. It can be a town expression or field reference. + +Field reference is a advanced usage of Node, you can found the details in [./eh.crs](./eh.crs). + +If a node owned page and fun at the same time, page should on the left of `->`, fun should on the right of `->`. Which is `page -> fun` + +#### Fun + +Fun represents the data processing process. + +There are all supported funs: + +| Name | Parameters | Description | +| --------- | -------------------------------- | ---------------------------------------- | +| $ | (selector: string) | CSS selector | +| html | | inner HTML | +| text | | inner text | +| outerHTML | | outer HTML | +| attr | (attr: string) | attribute value | +| style | | style attribute value | +| href | | href attribute value | +| src | | src attribute value | +| calc | (prec: int) | calculate arithmetic expression | +| match | (regexp: string) | match first sub-string via regular expression | +| expand | (regexp: string, target: string) | expand matched strings to target string | + + + +## Author + +Plutonist + +> [impl.moe](impl.moe) ยท Github [@wspl](impl.moe) + + + +## License + +``` +Copyright (c) 2017 Plutonist +All rights reserved. + +Redistribution and use in source and binary forms are permitted +provided that the above copyright notice and this paragraph are +duplicated in all such forms and that any documentation, +advertising materials, and other materials related to such +distribution and use acknowledge that the software was developed +by the Plutonist. The name of the +Plutonist may not be used to endorse or promote products derived +from this software without specific prior written permission. +THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR +IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED +WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. ``` \ No newline at end of file diff --git a/eh.crr b/eh.crs similarity index 100% rename from eh.crr rename to eh.crs diff --git a/main/main.go b/main/main.go index f2341e2..1d8eb46 100644 --- a/main/main.go +++ b/main/main.go @@ -5,7 +5,7 @@ import ( ) func main() { - //buf, _ := ioutil.ReadFile("./eh.crr") + //buf, _ := ioutil.ReadFile("./eh.crs") //raw := string(buf) //c := New(raw) //c.Array("gallery").Each(func(c *Creeper) {