Skip to content

Releases: jgm/pandoc

pandoc 2.16.1

03 Nov 07:22
@jgm jgm
Compare
Choose a tag to compare
Click to expand changelog
  • Docx reader: don’t let first line indents trigger block quotes (#7655). This fixes a regression introduced in pandoc 2.15.

  • Docx writer: use getTimestamp for modification times in reference.docx (#7654). This ensures that when SOURCE_DATE_EPOCH is set, the modification times of files taken from the reference.docx will be set deterministically, allowing for reproducible builds.

  • Lua subsystem (Albert Krewinkel):

    • Load module pandoc.path on startup (#7524). Previously the module always had to be loaded via require 'pandoc.path'.
    • Fix typo in SoftBreak constructor.
    • Re-add content property to Strikeout elements. Fixes a regression introduced in 2.15.
    • Be more forgiving when retrieving the Image caption property. Fixes a regression introduced in 2.15.
    • Display Attr values using their native Haskell representation.
    • Allow omitting the 2nd parameter in pandoc.Code constructor. Fixes a regression introduced in 2.15 which required users to always specify an Attr value when constructing a Code element.
    • Allow to compare, show Citation values. Comparisons of Citation values are performed in Haskell; values are equal if they represent the same Haskell value. Converting a Citation value to a string now yields its native Haskell string representation.
    • Restore List behavior of MetaList (#7650). Fixes a regression introduced in 2.16 which had MetaList elements lose the pandoc.List properties.
    • Restore content property on Header elements.
    • Ensure Block elements have all expected properties.
    • Ensure Inline elements have all expected properties.
  • Allow tasty-bench 0.3.x.

pandoc 2.16

31 Oct 20:50
@jgm jgm
Compare
Choose a tag to compare
Click to expand changelog
  • Switch back from HsYAML to yaml for parsing YAML metadata (#6084). HsYAML is around 20 times slower in parsing large YAML bibliographies. In addition, HsYAML is not being actively maintained. This sets us back in our attempts to free ourselves from C dependencies (#4535). But I don’t see a good alternative until a faster pure Haskell parser is available. Notes:

    • We’ve removed the FromYAML instances for all types that had them, since this is a HsYAML-specific typeclass [API change]. (The yaml package just uses From/ToJSON instead of having a dedicated From/ToYAML class.)
    • Unlike HsYAML (in the configuration we were using), yaml parses ‘Y’, ‘N’, ‘Yes’, ‘No’, ‘On’, ‘Off’ as boolean values. Users may need to quote these when they are meant to be interpreted as strings. Similarly, ‘null’ is parsed as a YAML null value (and will be treated as an empty string by pandoc rather than the string ‘null’). Quoting it will force it to be interpreted as a string.
    • Some tests had to be adjusted accordingly.
    • Pandoc now behaves in a more useful way when the YAML metadata contains escaping errors: instead of just failng silently and falling back to some other interpretation of the section, it raises a YAML parsing error.
  • Markdown writer: Ensure that special values are quoted in YAML metadata. These include “Y”, “yes”, “on”, and “off”, which are now (with yaml library) considered boolean values, as well as “null”.

  • Change JSON encodings of some types.

    • For LineEnding use lowercase constructors, e.g. crlf, native.
    • For HTMLSlideVariant use lowercase constructors.
    • For ReaderOptions use e.g. default-image-extension instead of readerDefaultImageExtension for field names.
    • For Extension, use e.g. tex_math_dollars instead of Ext_tex_math_dollars as constructor.
    • For Extensions, use an array of Extensions, instead of an object wrapping the tag Extensions and an integer. (The integer representation is not supposed to be part of the public API.)
    • For Opt, use field names like tab-stop instead of optTabStop.
  • Docx writer:

    • Add IDs to native_numbering test (Tristan Stenner).
    • Move “:” out of the caption bookmark (Tristan Stenner). This is needed so that native references to the figure are included as “As seen in Figure X, it is…” instead of “As seen in [Figure: X, it is…”
  • Lua (Albert Krewinkel, except as noted):

    • Use hslua module abstraction where possible.

    • Fix placement of tests for Block elements in pandoc module tests

    • Increase strictness when getting attribute keys

    • Re-add t and tag property to Attr values. Removal of these properties from Attr values was a regression.

    • Fix pandoc.utils.stringify regression. The pandoc.utils.stringify function returned empty strings when called with a string argument.

    • Fix a copy/paste bug in Lua marshalling code (John MacFarlane, #7639). This caused links to be changed to figures when Lua filters changed link properties.

    • Re-add content property to Link elements (#7647). This was a regression introduced in version 2.15.

    • Generate constants in module pandoc programmatically.

    • Marshal SimpleTable, ListAttributes, Citation, and Block values as userdata objects. Properties of Block values are marshalled lazily, which generally improves performance considerably. Script users may also notice the following differences:

      • Block element properties can no longer be accessed by numerical indexing of the .c field. The .c property now serves as an alias for .content, so some filter that used this undocumented method for property access may continue to work, while others will need to be updated and use proper property names.
      • The marshalled Block elements now have a show method, and a __tostring metamethod. Both return the Haskell string representation of the element.
      • Block values now have the Lua type userdata instead of table.
  • Add a short guide to pandoc’s sources (Albert Krewinkel).

  • Fix epub files in epub reader tests, so that they are valid according to epubcheck (#7586).

  • Allow time 1.13.

  • Require latest skylighting (0.12.1).

  • Fix build on GHC 9.2 (Joseph C. Sible).

  • Fix trypandoc so it builds with aeson > 2.

pandoc 2.15

24 Oct 02:18
@jgm jgm
Compare
Choose a tag to compare
Click to expand changelog
  • Add --sandbox option (#5045).

    • Add sandbox feature. When this option is used, readers and writers only have access to input files (and other files specified directly on command line). This restriction is enforced in the type system.
    • Filters, PDF production, custom writers are unaffected. This feature only insulates the actual readers and writers, not the pipeline around them in Text.Pandoc.App.
    • Note that when --sandboxed is specified, readers won’t have access to the resource path, nor will anything have access to the user data directory.
  • --self-contained: Fix bug that caused everything to be made a data URI (#7635, #7367). We only need to use data URIs in certain cases, but due to a bug they were being used always.

  • Pandoc will now fall back to latin1 encoding for inputs that can’t be read as UTF-8. This is what it did previously for content fetched from the web and not marked as to content type. It makes sense to do the same for local files. In this case a NotUTF8Encoded warning will be issued, indicating that pandoc is interpreting the input as latin1.

  • Markdown reader:

    • Don’t parse links or bracketed spans as citations (#7632). Previously pandoc would parse [link to (@a)](url) as a citation; similarly [(@a)]{#ident}. This is undesirable. One should be able to use example references in citations, and even if @a is not defined as an example reference, [@a](url) should be a link containing an author-in-text citation rather than a normal citation followed by literal (url).
    • Fix interaction of --strip-comments and list parsing (#7521). Use of --strip-comments was causing tight lists to be rendered as loose (as if the comment were a blank line).
    • Fix parsing bug for math in bracketed spans and links (#7623). This affects math with unbalanced brackets (e.g. $(0,1]$) inside links, images, bracketed spans.
    • Fix code blocks using --preserve-tabs (#7573). Previously they did not behave as the equivalent input with spaces would.
  • DocBook reader:

    • Honor linenumbering attribute (Samuel Tardieu). The attribute DocBook linenumbering="numbered" on code blocks maps to the numberLines class internally.
  • LaTeX reader:

    • Implement siunitx v3 commands (#7614). We support \unit, \qty, \qtyrange, and \qtylist as synonynms of \si, \SI, \SIrange, and \SIlist.
    • Properly handle \^ followed by group closing (#7615).
    • Recognize that \vadjust sometimes takes “pre” (#7531).
    • Ignore (and gobble parameters of) CSLReferences environment (#7531). Otherwise we get the parameters as numbers in the output.
    • Restrict \endinput to current file (Simun Schuster).
  • RST reader: handle escaped colons in reference definitions (#7568).

  • HTML reader:

    • Handle empty tbody element in table (#7589).
  • Ipynb reader (Kolen Cheung):

    • Get cell output mime from raw_mimetype in addition to format. (format is what the spec calls for, but raw_mimetype is often used in practice; see jupyter/nbformat#229).
    • Add more formats that can be handled as “raw” cells.
    • Fix mime type for rst.
    • Support text/markdown, which is now a supported mime type for raw output (#7561).
  • RTF reader:

    • Support \binN for binary image data.
    • If doc begins with { … } only parse its contents. Some documents seem to have non-RTF (e.g. XML) material after the {\rtf1 ... } group.
    • Ignore \pgdsc group. Otherwise we get style names treated as test.
    • Better handling of \* and bookmarks. We now ensure that groups starting with \* never cause text to be added to the document. In addition, bookmarks now create a span between the start and end of the bookmark, rather than an empty span.
  • Docx reader:

    • Avoid blockquote when parent style has more indent (Milan Bracke). When a paragraph has an indentation different from the parent (named) style, it used to be considered a blockquote. But this only makes sense when the paragraph has more indentation. So this commit adds a check for the indentation of the parent style.
    • Fix handling of empty fields (Milan Bracke). Some fields only have an instrText and no content, Pandoc didn’t understand these, causing other fields to be misunderstood because it seemed like a field was still open when it wasn’t.
    • Implement PAGEREF fields (Milan Bracke). These fields, often used in tables of contents, can be a hyperlink.
    • Fix handling of nested fields (Milan Bracke). Fields delimited by fldChar elements can contain other fields. Before, the nested fields would be ignored, except for the end, which would be considered the end of the parent field.
    • Add placeholder for word diagram instead of just omitting it (Ezwal).
  • Org reader:

    • Don’t parse a list as first item in a list item (#7557).
    • Allow an initial :PROPERTIES: drawer to add to metadata (#7520).
  • Docx writer:

    • Make id used in native_numbering predictable (#7551). If the image has the id IMAGEID, then we use the id ref_IMAGEID for the figure number. This allows one to create a filter that adds a figure number with figure name, e.g. <w:fldSimple w:instr=" REF ref_superfig "><w:r><w:t>Figure X</w:t> </w:r></w:fldSimple>. If an image lack an id, an id of the form ref_fig1 is used.
  • Ensure we have unique ids for wp:docPr and pic:cNvPr elements (#7527, #7503).

  • Handle SVG images (#4058). This change has several parts:

    • In Text.Pandoc.App, if the writer is docx, we fill the media bag and attempt to convert any SVG images to PNG, adding these to the media bag. The PNG backups have the same filenames as the SVG images, but with an added .png extension. If the conversion cannot be done (e.g. because rsvg-convert is not present), a warning is omitted.
    • In Text.Pandoc.Writers.Docx, we now use Word 2016’s syntax for including SVG images. If a PNG fallback is present in the media bag, we include a link to that too.
  • Powerpoint writer (Emily Bourke):

    • Add support for more layouts (#5097). Up til now, four layouts were supported: “Title Slide” (used for the automatically generated metadata slide), “Section Header” (used for headings above slide level), “Two Column” (used when there’s a columns div), “Title and Content” (used for all other slides). We now support three additional layouts: “Comparison”, “Content with Caption”, and “Blank”. The manual describes the logic that determines which layout is used for a slide. Layouts may be customized in the reference doc.
    • Support specifying slide background images using a background-image attribute on the slide’s heading. Only the “stretch” mode is supported, and the background image is centred around the slide in the image’s larger axis, matching the observed default behaviour of PowerPoint.
    • Add support for incremental lists (through same methods as in other slide writers) (#5689).
    • Copy embedded fonts from reference doc.
    • Include all themes in output archive.
    • Fix list level numbering (#4828, #4663). In PowerPoint, the content of a top-level list is at the same level as the content of a top-level paragraph: the only difference is that a list style has been applied. Previously, the writer incremented the paragrap h level on each list, turning what should be top-level lists into second-level lists.
    • Line up list continuation paragraphs. This commit changes the marL and indent values used for plain paragraphs and numbered lists, and changes the spacing defined in the reference doc master for bulleted lists. For paragraphs, there is now a left-indent taken from the otherStyle in the master. For numbered lists, the number is positioned where the text would be if this were a plain paragraph, and the text is indented to the next level. This means that continuation paragraphs line up nicely with numbered lists. Existing reference docs may need to be modified so that otherStyle and bodyStyle indent levels match, for this feature to work with them.
    • Consolidate text runs when possible (jgm). This slims down the output files by avoiding unnecessary text run elements.
    • Support footers in the reference doc. There is one behaviour which may not be immediately obvious: if the reference doc specifies a fixed date (i.e. not automatically updating), and there’s a date specified in the metadata for the document, the footer date is replaced by the metadata date.
    • Fix presentation rel numbering. Before now, the numbering of rIds was inconsistent when making the presentation XML and when making the presentation relationships XML.
    • Don’t add relationships unnecessarily. Before now, for any layouts added to the output from the default reference doc, the relationships were unconditionally added to the output. However, if there was already a layout in slideMaster1 at the same index then that results in duplicate relationships.
    • If slide level is 0, don’t insert a slide break between a heading and a following table, “columns” div, or paragraph starting with an image.
    • Fix capitalisation of notesMasterId.
    • Restructure tests.
  • Asciidoc writer:

    • Translate numberLines attribute to linesnum switch (Samuel Tardieu).
    • Improve escaping for -- in URLs (#7529).
  • LaTeX writer:

    • Make babel use more idiomatic (#7604, hseg). Use babel’s bidi implementation. Import babel languages individually instead of as package options. Move header-includes to after babel setup so it can be modified.
    • Use babel, not polyglossia, with xelatex. Previously polyglossia worked better with xelatex, but that is no longer the case, so we simplify the code...
Read more

pandoc 2.14.2

21 Aug 16:55
@jgm jgm
Compare
Choose a tag to compare
Click to expand changelog
  • Allow --slide-level=0 (#7476). When the slide level is set to 0, headings won’t be used at all in splitting the document into slides. Horizontal rules must be used to separate slides.

  • Add RTF reader (#3982). rtf is now supported as an input format as well as an output format. New module Text.Pandoc.Readers.RTF (exporting readRTF). [API change]

  • HTML reader: treat comments as blank when parsing (#7482).

  • Markdown reader:

    • Fix raw LaTeX injection issue (#7497). Using a code block containing \end{verbatim}, one could inject raw TeX into a LaTeX document even when raw_tex is disabled. Thanks to Augustin Laville for noticing the bug.
    • Multimarkdown sub- and superscripts (#5512, OCzarnecki). Added an extension short_subsuperscripts which modifies the behavior of subscript and superscript, allowing subscripts or superscripts containing only alphanumerics to end with a space character (eg. x^2 = 4 or H~2 is combustible). This improves support for multimarkdown.
  • RST reader: Fix :literal: includes (#7513). These should create code blocks, not insert raw RST.

  • LaTeX reader:

    • Proper implicit grouping around environment macros.
    • Support \global before \def, \let, etc. (#7494).
    • Fix scope for LaTeX macros (#7494). They should by default scope over the group in which they are defined (except \gdef and \xdef, which are global). In addition, environments must be treated as groups.
    • Improve handling of plain TeX macro primitives (#7474). Fixed semantics for \let.
    • Implement \edef, \gdef, and \xdef.
  • Docx reader: Improve docx reader’s robustness in extracting images (#7511). The docx reader made some assumptions about how docx containers were laid out that were not always true, with the result that some images in documents did not get extracted.

  • LaTeX writer: Increase table column width precision (#7466, Peter Fabinski). In some cases, the rounding performed by the LaTeX table writer would introduce visible overrun outside the text area. This adds two more decimal places to the width values.

  • Powerpoint writer:

    • Include image title in description (#7352, Emily Bourke). The image title (i.e. ![alt text](link "title")) was previously ignored when writing to pptx. This commit includes it in PowerPoint’s description of the image, along with the link.
    • Select layouts from reference doc by name (Emily Bourke). Until now, users had to make sure that their reference doc contains layouts in a specific order: the first four layouts in the file had to have a specific structure. Now the layout selection uses the layout names rather than order: users must make sure their reference doc contains four layouts with specific names, and if a layout with the right name isn’t found pandoc will emit a warning and use the corresponding layout from the default reference doc as a fallback.
  • Docx writer: be sensitive to the native_numbering extension (#7499). Figure and table numbers are now only included if native_numbering is enabled. (By default it is disabled.) This is a behavior change with respect to 2.14.1, but the default behavior is now that of previous versions. The change was necessary to avoid incompatibilities between pandoc’s native numbering and third-party cross reference filters like pandoc-crossref.

  • RTF writer:

    • Omit \bin in \pict. According to the spec, this is not needed or wanted when the data is in hexadecimal format, as here.
    • Emit ``` for section headings.
  • RTF template: specify font family for fixed-width font f1. According to the spec, this is mandatory.

  • LaTeX writer: Use ulem for underline (#7351). ulem is conditionally included already when the strikeout variable is set, so we set this when there is underlined text, and use \uline instead of \underline. This fixes wrapping for underlined text.

  • Text.Pandoc.Citeproc:

    • Revise citeproc code to fit new citeproc 0.5 API (thanks to Benjamin Bray). Linkification of URLs in the bibliography is now done in the citeproc library, depending on the setting of an option. We set that option depending on the value of the metadata field link-bibliography (defaulting to true, for consistency with earlier behavior). If a DOI, PMID, PMCID, or URL field is present but not explicitly rendered, the title (or if no title, the whole entry) is hyperlinked. These changes implement the recommendations from the draft CSL v1.0.2 spec (Appendix VI): https://github.com/citation-style-language/documentation/blob/master/specification.rst#appendix-vi-links
    • Avoid odd handling of quotes. Recent citeproc changes allow us to ignore Quoted elements; citeproc now uses its own method for represented quoted things, and only localizes and flipflops quotes it adds itself. Convert Quoted in bib entries to special Spans before passing them off to citeproc. This ensures that we get proper localization and flipflopping if, e.g., quotes are used in titles (jgm/citeproc#87).
    • Removed quote localization from citeproc processing. This is now done in citeproc itself.
  • Text.Pandoc.Logging: Add PowerpointTemplateWarning log message type [API change] (Emily Bourke).

  • Text.Pandoc.Extension: Add Ext_short_subsuperscripts constructor to Extension [API change] (OCzarnecki).

  • Various sample.lua editorial fixes (#7493, #7487, William Lupton).

  • Bump base-compat version so we get compatibility with base 4.12.

  • Use Prelude from base-compat for ghc 8.4 too.

  • Add haskell-language-server to shell.nix (#7496, Emily Bourke).

  • Tests.Helpers: export testGolden and use it in RTF reader. This gives a diff output on failure.

  • Remove obsolete and incorrect sentence in --slide-level docs.

  • Add internal module Text.Pandoc.Network.HTTP, exporting urlEncode.

  • Text.Pandoc.Parsing: parseFromString: preserve at least the source directory (#7464). Previously we just set the source name to “chunk” when parsing from strings, to avoid misleading source positions. This had the side effect that rebase_relative_paths would break inside sections that were parsed as strings. So, now we use “ORIGINAL_SOURCE_PATH_chunk” instead of just “chunk”.

  • Text.Pandoc.MIME: use image/x-xcf instead of application/x-xcf (#7454).

  • Don’t compare cdLine in OOXML golden tests (Emily Bourke). The cdLine field gives the line of the file some CData was found on, which reflects irrelevant formatting differences.

  • Provide more detailed XML diff in tests (Emily Bourke).

  • OOXML tests: silence warnings. These can make the test output confusing, making people think tests are failing when they’re passing.

  • INSTALL.md: Add GitLab CI/CD example (#7448, Veratyr).

  • MANUAL.txt

    • Clarifications (William Lupton).
    • Add a note on security risks of include directives.
  • Document use of the ‘underline’ class (#7492, #7484, William Lupton).

  • Add a FAQ about the “Cannot allocate memory” error on M1 macs.

  • Use texmath 0.12.3.1.

  • Use released citeproc 0.5.

  • Remove dependency on HTTP package (#7456, mt_caret).

pandoc 2.14.1

19 Jul 05:42
@jgm jgm
Compare
Choose a tag to compare
Click to expand changelog
  • Text.Pandoc.ImageSize: Add Tiff constructor for ImageType (#7405) [Minor API change]. This allows pandoc to get size information from tiff images.

  • Markdown reader: don’t try to read contents in self-closing HTML tag. Previously we had problems parsing raw HTML with self-closing tags like <col/>. The problem was that pandoc would look for a closing tag to close the markdown contents, but the closing tag had, in effect, already been parsed by htmlTag.

  • LaTeX reader:

    • Avoid trailing hyphen in translating languages (#7447). Previously \foreignlanguage{english} turned into <span lang="en-">. The same issue affected Arabic.
    • Support \cline in LaTeX tables (#7442).
    • Improved parsing of raw LaTeX from Text streams (rawLaTeXParser, used to read LaTeX in Markdown files, #7434). We now use source positions from the token stream to tell us how much of the text stream to consume. Getting this to work required a few other changes to make token source positions accurate.
  • DocBook reader:

    • Handle images with imageobjectco elements (#7440).
    • Add support for citerefentry (#7437, Jan Tojnar).
  • RST reader: fix regression with code includes (#7436). With the recent changes to include infrastructure, included code blocks were getting an extra newline.

  • HTML reader:

    • Recognize data-external when reading HTML img tags (#7429, Michael Hoffmann). Preserve all attributes in img tags. If attributes have a data- prefix, it will be stripped. In particular, this preserves a data-external attribute as an external attribute in the pandoc AST.
    • Add col, colgroup to ‘closes’ definitions
  • HTML writer:

    • Remove duplicated alt text in HTML output (Aner Lucero).
    • Remove aria-hidden when explicit alt text is provided (Aner Lucero).
    • Set boolean values for reveal.js variables.
  • Docx writer:

    • Add table numbering for captioned tables. The numbers are added using fields, so that Word can create a list of tables that will update automatically.
    • Support figure numbers. These are set up in such a way that they will work with Word’s automatic table of figures (#7392).
  • Markdown writer: put space between Plain and following fenced Div (#4465).

  • EPUB writer: Don’t incorporate externally linked images in EPUB documents (#7430, Michael Hoffmann). Just as it is possible to avoid incorporating an image in EPUB by passing data-external="1" to a raw HTML snippet, this makes the same possible for native Images, by looking for an associated external attribute.

  • Text.Pandoc.PDF:

    • Fix svgIn path error (#7431). We were duplicating the temp directory; this didn’t cause problems on macOS or linux because there we use absolute paths for the temp directory. But on Windows it caused errors converting SVG files.
    • convertImage: normalize paths (#7431). This will avoid paths on Windows with mixed path separators.
  • Text.Pandoc.Class: Always use / when adding directory to image destination with extractMedia, even on Windows.

  • Text.Pandoc.Citeproc:

    • Allow $ characters in bibtex keys (#7409).
    • Set proper initial source name in parsing BibTeX (for better error messages.)
    • Revamp note citation handling (#7394). Use latest citeproc, which uses a Span with a class rather than a Note for notes. This helps us distinguish between user notes and citation notes. Don’t put citations at the beginning of a note in parentheses. Fix small bug in handling of citations in notes, which led to commas at the end of sentences in some cases.
    • Cleanup and efficiency improvement in deNote.
    • Improve punctuation moving with --citeproc. Previously, using --citeproc could cause punctuation to move in quotes even when there aer no citations. This has been changed; punctuation moving is now limited to citations. In addition, we only move footnotes around punctuation if the style is a note style, even if notes-after-punctuation is true.
  • Use citeproc 0.10. This helps improve note citations (see above) and eliminates double hyperlinks in author-in-text citations. Author-only citations are no longer hyperlinked. See jgm/citeproc#77. It also fixes moving of punctuation inside quotes to conform to the CSL spec: only comma and period are moved, not question mark or exclamation point.

  • Text.Pandoc.Error: fix line calculations in reporting parsec errors. Also remove a spurious initial newline in the error report.

  • Use doctemplates 0.4.1, which gives us better support for boolean variable values. Previously $if(foo)$ would evaluate to true for variables with boolean false values, because it cared only about the string rendering (#7402).

  • Require commonmark-pandoc >= 0.2.2.1. This fixes task lists with multiple paragraphs.

  • Use skylighting 0.11.

  • CSS in HTML template: reset overflow-wrap on code blocks (Mauro Bieg, #7423).

  • LaTeX template: Revert change in PR #7295: “move title, author, date up to top of preamble.” The change caused problem for people who used LaTeX commands defined defined later in the preamble in the title or author fields (#7422).

  • Add doc/faqs.md. This is imported from the website; in the future the website version will be drawn from here. Added a FAQ on the use of \AtEndPreamble for cases when the contents of header-includes need to refer to definitions that come later in the preamble. See #7422.

  • Upgrade Debian 10 AMI for build-arm.sh.

  • CircleCI: change to using xcode 11.1.0 (macOS 10.14.4). We previously built on 10.13, but 10.13 no longer gets security updates and CirclCI is deprecating.

pandoc 2.14.0.3

22 Jun 22:01
@jgm jgm
Compare
Choose a tag to compare
Click to expand changelog
  • Text.Pandoc.MediaBag insertMediaBag: ensure we get a sane mediaPath for URLs (#7391). In earlier 2.14.x versions, we’d get incorrect paths for resources downloaded from URLs when the media are extracted (including in PDF production).
  • Text.Pandoc.Parsing: improve emailAddress (#7398). Previously the parser would accept characters in domains that are illegal in domains, and this sometimes caused it to gobble bits of the following text.
  • txt2tags reader: modify the email address parser so it still includes form parameters, even after the change to emailAddress in Text.Pandoc.Parsing.
  • Text.Pandoc.Readers.Metadata: Fix regression with comment-only YAML metadata blocks (#7400).
  • reveal.js writer and template: better handling of options. Previously it was impossible to specify false values for options that default to true (e.g. center); setting the option to false just caused the portion of the template setting the option to be omitted. Now we prepopulate all the variables with their default values, including them all unconditionally and allowing them to be overridden.
  • Markdown writer: Fix regression in code blocks with attributes (#7397). Code blocks with a single class but nonempty attributes were having attributes drop as a result of #7242.
  • LaTeX writer:
    • Add strut at end of minipage if it contains line breaks. Without them, the last line is not as tall as it should be in some cases.
    • Always use a minipage for cells with line breaks, when width information is available (#7393). Otherwise the way we treat them can lead to content that overflows a cell.
    • Use \strut instead of ~ before \\ in empty line.
  • Use lts-18.0 stack resolver.
  • Require skylighting 0.10.5.2 (adding support for Swift).
  • Require commonmark 0.2.1.
  • Rephrase section on unsafe HTML in manual.
  • Create SECURITY.md

pandoc 2.14.0.2

13 Jun 17:21
@jgm jgm
Compare
Choose a tag to compare
Click to expand changelog
  • Fix MediaBag regressions (#7345). iIn the 2.14 release --extract-media stopped working as before; there could be mismatches between the paths in the rendered document and the extracted media. This patch makes several changes that restore the earlier behavior (while keeping the same API). The mediaPath in 2.14 was always constructed from the SHA1 hash of the media contents. Now, we preserve the original path unless it’s an absolute path or contains .. segments (in that case we use a path based on the SHA1 hash of the contents).

    In Text.Pandoc.MediaBag, mediaDirectory and mediaItems now use the mediaPath, rather than the mediabag key, for the first component of the tuple. This makes more sense, I think, and fits with the documentation of these functions; eventually, though, we should rework the API so that mediaItems returns both the keys and the MediaItems.

    In Text.Pandoc.Class.IO, rewriting of source paths in extractMedia has been fixed.

    In Text.Pandoc.Class.PandocMonad, fillMediaBag has been modified so that it doesn’t modify image paths (that was part of the problem in #7345).

    We now do path normalization (e.g. \ separators on Windows) in writing the media.

  • Text.Pandoc.PDF:

    • Text.Pandoc.PDF: Fix regression in 2.14 for generation of PDFs with SVGs (#7344).
    • Only print relevant part of environment on --verbose. Since --verbose output might be put in an issue, we want to avoid spilling out secrets in environment variables.
  • Markdown reader: fix pipe table regression in 2.11.4 (#7343). Previously pipe tables with empty headers (that is, a header line with all empty cells) would be rendered as headerless tables. This broke in 2.11.4. The fix here is to produce an AST with an empty table head when a pipe table has all empty header cells.

  • LaTeX reader: don’t allow optional * on symbol control sequences (#7340). Generally we allow optional starred variants of LaTeX commands (since many allow them, and if we don’t accept these explicitly, ignoring the star usually gives acceptable results). But we don’t want to do this for \(*\) and similar cases.

  • Docx reader: handle absolute URIs in Relationship Target (#7374).

  • Docx writer: fix handling of empty table headers (Albert Krewinkel, #7369). A table header which does not contain any cells is now treated as an empty header.

  • LaTeX writer: Fix regression in table header position (#7347). In recent versions the table headers were no longer bottom-aligned (if more than one line). This patch fixes that by using minipages for table headers in non-simple tables.

  • CommonMark writer:

    • Do not use simple class for fenced-divs (Jan Tojnar, amends #7242.)
    • Do not throw away attributes when Ext_attributes is enabled. Ext_attributes covers at least the following: Ext_fenced_code_attributes, Ext_header_attributes, Ext_inline_code_attributes, Ext_link_attributes.
  • Markdown writer:

    • Allow pipe_tables to be disabled for commonmark formats (commonmark_x, gfm) (#7375).
    • Re-use functions from Text.Pandoc.Markdown.Inline (Jan Tojnar).
  • DocBook writer: Remove non-existent admonitions (Jan Tojnar). attention, error and hint are reStructuredText specific.

  • HTML writer: Don’t omit width attribute on div (#7342).

  • Text.Pandoc.MIME, extensionFromMimeType: add a few special cases. When we do a reverse lookup in the MIME table, we just get the last match, so when the same mime type is associated with several different extensions, we sometimes got weird results, e.g. .vs for text/plain. These special cases help us get the most standard extensions for mime types like text/plain.

  • Lua utils: fix handling of table headers in from_simple_table (Albert Krewinkel, #7369). Passing an empty list of header cells now results in an empty table header.

  • Text.Pandoc.Citeproc:

    • Avoid duplicate classes and attributes on references div.
    • Fix regression in citeproc processing (#7376). If inline references are used (in the metadata references field), we should still only include in the bibliography items that are actually cited (unless nocite is used).
  • Require citeproc 0.4.0.1. This fixes a bug which led to doubled “et al.” in some (rare) circumstances.

  • MANUAL.txt:

    • Mention GladTeX for EPUB export (Sebastian Humenda). This updates the manual and the web site about the GladTeX usage.
    • More details and a useful link for YAML syntax.
  • CONTRIBUTING.md: update modules overview (Albert Krewinkel).

  • using-the-pandoc-api.md: switch from String to Text (Albert Krewinkel).

pandoc 2.14.0.1

01 Jun 13:31
@jgm jgm
Compare
Choose a tag to compare
Click to expand changelog
  • Commonmark reader: Fix regression in 2.14 with YAML metdata block parsing, which could cause the document body to be omitted after metadata (#7339).

  • HTML reader: fix column width regression in 2.14 (#7334). Column widths specified with a style attribute were off by a factor of 100.

  • Markdown reader: in rebasePaths, check for both Windows and Posix absolute paths. Previously Windows pandoc was treating /foo/bar.jpg as non-absolute.

  • Text.Pandoc.Logging: In rendering LoadedResource, use relative paths.

  • Docx writer: fix regression on captions (#7328). The “Table Caption” style was no longer getting applied. (It was overwritten by “Compact.”)

  • Use commonmark-extensions 0.2.1.2

pandoc 2.14

29 May 05:17
@jgm jgm
Compare
Choose a tag to compare
Click to expand changelog
  • Change reader types, allowing better tracking of source positions [API change]. Previously, when multiple file arguments were provided, pandoc simply concatenated them and passed the contents to the readers, which took a Text argument. As a result, the readers had no way of knowing which file was the source of any particular bit of text. This meant that we couldn’t report accurate source positions on errors or include accurate source positions as attributes in the AST. More seriously, it meant that we couldn’t resolve resource paths relative to the files containing them (see e.g. #5501, #6632, #6384, #3752).

  • Add rebase_relative_paths extension (#3752). When enabled, this extension rewrites relative image and link paths by prepending the (relative) directory of the containing file. This behavior is useful when your input sources are split into multiple files, across several directories, with files referring to images stored in the same directory. The extension can be enabled for all markdown and commonmark-based formats.

  • Add Text.Pandoc.Sources (exported module), with a Sources type and a ToSources class. A Sources wraps a list of (SourcePos, Text) pairs [API change]. A parsec Stream instance is provided for Sources. The module also exports versions of parsec’s satisfy and other Char parsers that track source positions accurately from a Sources stream (or any instance of the new UpdateSourcePos class).

  • Text.Pandoc.Parsing

    • Export the modified Char parsers defined in Text.Pandoc.Sources instead of the ones parsec provides. Modified parsers to use a Sources as stream [API change].
    • Improve include file functions [API change]. Remove old insertIncludedFileF. Give insertIncludedFile a more general type, allowing it to be used where insertIncludedFileF was.
    • Add parameter to the citeKey parser from Text.Pandoc.Parsing, which controls whether the @{..} syntax is allowed [API change].
  • Text.Pandoc.Error: Modified the constructor PandocParsecError to take a Sources rather than a Text as first argument, so parse error locations can be accurately reported.

  • Fix source position reporting for YAML bibliographies (#7273).

  • Issue error message when reader or writer format is malformed (#7231). Previously we exited with an error status but (due to a bug) no message.

  • Smarter smart quotes (#7216, #2103). Treat a leading " with no closing " as a left curly quote. This supports the practice, in fiction, of continuing paragraphs quoting the same speaker without an end quote. It also helps with quotes that break over lines in line blocks.

  • Markdown reader:

    • Use MetaInlines not MetaBlocks for multimarkdown metadata fields. This gives better results in converting to e.g. pandoc markdown.
    • Implement curly-brace syntax for Markdown citation keys (#6026). The change provides a way to use citation keys that contain special characters not usable with the standard citation key syntax. Example: @{foo_bar{x}'} for the key foo_bar{x}. It also allows separating citation keys from immediately following text, e.g. @{foo}A.
  • RST reader:

    • Seek include files in the directory of the file containing the include directive, as RST requires (#6632).
    • Use insertIncludedFile from Text.Pandoc.Parsing instead of reproducing much of its code.
  • Org reader: Resolve org includes relative to the directory containing the file containing the INCLUDE directive (#5501).

  • ODT reader: Treat tabs as spaces (#7185, niszet).

  • Docx reader:

    • Add handling of vml image objects (#7257, mbrackeantidot).
    • Support new table features (Emily Bourke, #6316): column spans, row spans, multiple header rows, table description (parsed as a simple caption), captions, column widths.
  • LaTeX reader:

    • Improved siunitx support (#6658, #6620).
    • Better support for \xspace (#7299).
    • Improve parsing of \def macros. We previously set “verbatim mode” even for parsing the initial \def; this caused problems for \def nested inside another \def.
    • Implement \newif.
  • ConTeXt writer: improve ordered lists (#5016, Denis Maier). Change ordered list from itemize to enumerate. Add new itemgroup for ordered lists. Remove manual insertion of width attributes. Use tabular figures in ordered list enumerators.

  • HTML reader:

    • Don’t fail on unmatched closing “script” tag (Albert Krenkel, #7282).
    • Keep h1 tags as normal headers (#2293, Albert Krewinkel). The tags <title> and <h1 class="title"> often contain the same information, so the latter was dropped from the document. However, as this can lead to loss of information, the heading is now always retained. Use --shift-heading-level-by=-1 to turn the <h1> into the document title, or a filter to restore the previous behavior.
    • Handle relative lengths (e.g. 2*) in HTML column widths (#4063). See https://www.w3.org/TR/html4/types.html#h-6.6.
  • DocBook/JATS readers:

    • Fix mathml regression caused by the switch in XML libraries (#7173).
    • Fix “phrase” in DocBook: take classes from “role” not “class” (#7195).
  • DocBook reader: ensure that first and last names are separated (#6541).

  • Jira reader (Albert Krewinkel, #7218):

    • Support “smart” links: [alias|https://example.com|smart-card] syntax.
    • Allow spaces and most unicode characters in attachment links.
    • No longer require a newline character after {noformat}.
    • Only allow URI path segment characters in bare links.
    • The file: schema is no longer allowed in bare links; these rarely make sense.
  • Plain writer: handle superscript unicode minus (#7276).

  • LaTeX writer:

    • Better handling of line breaks in simple tables (#7272). Now we also handle the case where they’re embedded in other elements, e.g. spans.
    • For beamer output, support exampleblock and alertblock (#7278). A block will be rendered as an exampleblock if the heading has class example and an alertblock if it has class alert.
    • Separate successive quote chars with thin space (#6958, Albert Krewinkel). Successive quote characters are separated with a thin space to improve readability and to prevent unwanted ligatures. Detection of these quotes sometimes had failed if the second quote was nested in a span element.
    • Separate successive quote chars with thin space (#6958, Albert Krewinkel).
  • EPUB Writer: Fix belongs-to-collection XML id choice (#7267, nuew). The epub writer previously used the same XML id for both the book identifier and the epub collection. This causes an error on epubcheck.

  • BibTeX/BibLaTeX writer: Handle annote field (#7266).

  • ZimWiki writer: allow links and emphasis in headers (#6605, Albert Krewinkel).

  • ConTeXt writer:

    • Support blank lines in line blocks (#6564, Albert Krewinkel, thanks to @denismaier).
    • Use span identifiers as reference anchors (#7246, Albert Krewinkel).
  • HTML writer:

    • Keep attributes from code nested below pre tag (#7221, Albert Krewinkel). If a code block is defined with <pre><code class="language-x">…</code></pre>, where the <pre> element has no attributes, then the attributes from the <code> element are used instead. Any leading language- prefix is dropped in the code’s class attribute are dropped to improve syntax highlighting.
    • Ensure headings only have valid attribs in HTML4 (#5944, Albert Krewinkel).
    • Parse <header> as a Div (Albert Krewinkel).
  • Org writer:

  • JATS writer (Albert Krewinkel):

    • Use either styled-content or named-content for spans (#7211). If the element has a content-type attribute, or at least one class, then that value is used as content-type and the span is put inside a <named-content> element. Otherwise a <styled-content> element is used instead.
    • Reduce unnecessary use of <p> elements for wrapping (#7227). The <p> element is used for wrapping in cases were the contents would otherwise not be allowed in a certain context. Unnecessary wrapping is avoided, especially around quotes (<disp-quote> elements).
    • Convert spans to <named-content> elements (#7211). Spans with attributes are converted to <named-content> elements instead of being wrapped with <milestone-start/> and <milestone-end> elements. Milestone elements are not allowed in documents using the articleauthoring tag set, so this change ensures the creation of valid documents.
    • Add footnote number as label in backmatter (#7210). Footnotes in the backmatter are given the footnote’s number as a label. The articleauthoring output is unaffected from this change, as footnotes are placed inline there.
    • Escape disallows chars in identifiers. XML identifiers must start with an underscore or letter, and can contain only a limited set of punctuation characters. Any IDs not adhering to these rules are rewritten by writing the offending characters as Uxxxx, where xxxx is the character’s hex code.
  • Jira writer: use {color} when span has a color attribute (Albert Krewinkel, tarleb/jira-wiki-markup#10).

  • Docx writer:

    • Autoset table width if no column has an explicit width (Albert Krewinkel).
    • Extract Table handling into separate module (Albert Krewinkel).
    • Support colspans and rowspans in tables (Albert Krewinkel, #6315).
    • Support multirow table headers (Albert Krewinkel).
    • Improve integration ...
Read more

pandoc 2.13

21 Mar 04:54
@jgm jgm
Compare
Choose a tag to compare
Click to expand changelog
  • Support yaml_metadata_block extension for commonmark, gfm (#6537). This support is a bit more limited than with pandoc’s markdown. The YAML block must be the first thing in the input, and the leaf notes are parsed in isolation from the rest of the document. So, for example, you can’t use reference links if the references are defined later in the document.

  • Fix fallback to default partials when custom templates are used. If the directory containing a template does not contain the partial, it should be sought in the default templates, but this was not working properly (#7164).

  • Handle nocite better with --biblatex and --natbib (#4585). Previously the nocite metadata field was ignored with these formats. Now it populates a nocite-ids template variable and causes a \nocite command to be issued.

  • Text.Pandoc.Citeproc: apply fixLinks correctly (#7130). This is code that incorporates a prefix like https://doi.org/ into a following link when appropriate.

  • Text.Pandoc.Shared:

    • Remove backslashEscapes, escapeStringUsing [API change]. Replace these inefficient association list lookups with more efficient escaping functions in the writers that used them (for a 10-25% performance boost in org, haddock, rtf, texinfo writers).
    • Remove ToString, ToText typeclasses [API change]. These were needed for the transition from String to Text, but they are no longer used and may clash with other things.
    • Simplify compactDL.
  • Text.Pandoc.Parsing:

    • Change type of readWithM so that it is no longer polymorphic [API change]. The ToText class has been removed, and now that we’ve completed the transition to Text we no longer need this to operate on Strings.
    • Remove F type synonym [API change]. Muse and Org were defining their own F anyway.
  • Text.Pandoc.Readers.Metadata:

    • Export yamlMetaBlock [API change].
    • Make yamlBsToMeta, yamlBsToRefs polymorphic on the parser state [API change].
  • Markdown reader: Fix regression with tex_math_backslash (#7155).

  • MediaWiki reader: Allow block-level content in notes (ref) (#7145).

  • Jira reader (Albert Krewinkel):

    • Fixed parsing of autolinks (i.e., of bare URLs in the text). Previously an autolink would take up the rest of a line, as spaces were allowed characters in these items.
    • Emoji character sequences no longer cause parsing failures. This was due to missing backtracking when emoji parsing fails.
    • Mark divs created from panels with class “panel”.
  • RST reader: fix logic for ending comments (#7134). Previously comments sometimes got extended too far.

  • DocBook writer: include Header attributes as XML attributes on section (Erik Rask). Attributes with key names that are not allowed as XML attributes are dropped, as are attributes with invalid values and xml:id (DocBook 5) and id (DocBook 4).

  • Docx writer:

    • Make nsid in abstractNum deterministic. Previously we assigned a random number, but we don’t need random values, so now we just assign a value based on the list marker.
    • Use integral values for w:tblW (#7141).
  • Jira writer (Albert Krewinkel):

    • Block quotes are only rendered as bq. if they do not contain a linebreak.
    • Jira writer: improve div/panel handling. Include div attributes in panels, always render divs with class panel as panels, and avoid nesting of panels.
  • HTML writer: Add warnings on duplicate attribute values. This prevents emitting invalid HTML. Ultimately it would be good to prevent this in the types themselves, but this is better for now.

  • Org writer: Prevent unintended creation of ordered list items (#7132, Albert Krewinkel). Adjust line wrapping if default wrapping would cause a line to be read as an ordered list item.

  • JATS templates: support ‘equal-contrib’ attrib for authors (Albert Krewinkel). Authors who contributed equally to a paper may be marked with equal-contrib.

  • reveal.js template: replace JS comment with HTML (#7154, Florian Kohrt).

  • Text.Pandoc.Logging: Add DuplicateAttribute constructor to LogMessage. [API change]

  • Use -j4 for linux release build. This speeds up the build dramatically on arm.

  • cabal.project: remove ghcoptions. Move flags to top level, so they can be set differently on the command line.

  • Require latest texmath, skylighting, citeproc, jira-wiki-markup. (The latest skylighting fixes a bad bug with Haskell syntax highlighting.) Narrow version bounds for texmath, skylighting, and citeproc, since the test output depend on them.

  • Use doclayout 0.3.0.2. This significantly reduces the time and memory needed to compile pandoc.

  • Use foldl' instead of foldl everywhere.

  • Update bounds for random (#7156, Alexey Kuleshevich).

  • Remove uses of some partial functions.

  • Don’t bake in a larger stack size for the executable.

  • Test improvements:

    • Use getExecutablePath from base, avoiding the dependency on executable-path.
    • Factor out setupEnvironment in Helpers, to avoid code duplication.
    • Fix finding of data files by setting teh pandoc_datadir environment variable when we shell out to pandoc. This avoids the need to use --data-dir for the tests, which caused problems finding pandoc.lua when compiling without the embed_data_files flag (#7163).
  • Benchmark improvements:

    • Build +RTS -A8m -RTS into default ghc-options for benchmark. This is necessary to get accurate benchmark results; otherwise we are largely measuring garbage collecting, some not related to the current benchmark.
    • Allow specifying BASELINE file in ‘make bench’ for comparison (otherwise the latest benchmark is chosen by default).
    • Force readFile in benchmarks early (Bodigrim).
  • CONTRIBUTING: suggest using a cabal.project.local file (#7153, Albert Krewinkel).

  • Add ghcid-test to Makefile. This loads the test suite in ghcid.