Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用动态规划优化提高计算性能 #33

Open
kingname opened this issue Oct 13, 2019 · 1 comment
Open

使用动态规划优化提高计算性能 #33

kingname opened this issue Oct 13, 2019 · 1 comment
Labels
enhancement New feature or request

Comments

@kingname
Copy link
Collaborator

No description provided.

@kingname kingname added the enhancement New feature or request label Oct 13, 2019
@Leechael
Copy link

  1. 把 utils.py 中的正则表达式使用 re.compile 提前编译了能提速。
  2. 提取所有文字的 XPath 改为 //body//text() 的话可能算是一个提速点。尝试过通过 XPath 把 iframe / script / style 排除在外,这个可以考虑但没大规模尝试过。
  3. 从 meta 标签提取时间,如果先把 meta 标签全部提取出来,并且改为 Trie based 的 regexp 进行匹配,大概有 60x 的提升:(?:OriginalPublicationDate|Pub(?:Date|lishDate)|_pubtime|a(?:pub:time|rticle(?::published_time|_date_original))|date(?:Published|Update)|og:(?:published_time|release_date|time)|pub(?:date|li(?:cation_date|shdate)|time)|rnews:datePublished|sailthru\\.date|weibo:\\ article:create_at)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants