Skip to content

Latest commit

 

History

History
53 lines (44 loc) · 2 KB

CHANGELOG.rst

File metadata and controls

53 lines (44 loc) · 2 KB

Changelog for jusText

3.0.1 (2024-05-09)

  • BUG FIX: Fix issue with new version of lxml #48.

3.0.0 (2021-10-21)

  • INCOMPATIBLE CHANGE: Dropped support for Python 3.4 and below.
  • BUG FIX: Don't join words separated only by <br> tag.
  • BUG FIX: List available stop-lists alphabetically.

2.2.0 (2016-03-06)

  • INCOMPATIBLE CHANGE: Stop words are case insensitive.
  • INCOMPATIBLE CHANGE: Dropped support for Python 3.2
  • BUG FIX: Preserve new lines from original text in paragraphs.

2.1.1 (2014-05-27)

  • BUG FIX: Function decode_html now respects parameter errors when falling to default_encoding #9.

2.1.0 (2014-01-25)

  • FEATURE: Added XPath selector to the paragrahs. XPath selector is also available in detailed output as xpath attribute of <p> tag #5.

2.0.0 (2013-08-26)

  • FEATURE: Added pluggable DOM preprocessor.
  • FEATURE: Added support for Python 3.2+.
  • INCOMPATIBLE CHANGE: Paragraphs are instances of justext.paragraph.Paragraph.
  • INCOMPATIBLE CHANGE: Script 'justext' removed in favour of command python -m justext.
  • FEATURE: It's possible to enter an URI as input document in CLI.
  • FEATURE: It is possible to pass unicode string directly.

1.2.0 (2011-08-08)

  • FEATURE: Character counts used instead of word counts where possible in order to make the algorithm work well in the language independent mode (without a stoplist) for languages where counting words is not easy (Japanese, Chinese, Thai, etc).
  • BUG FIX: More robust parsing of meta tags containing the information about used charset.
  • BUG FIX: Corrected decoding of HTML entities &#128; to &#159;

1.1.0 (2011-03-09)

  • First public release.