Skip to content

Releases: jgm/pandoc

pandoc 2.8.0.1

27 Nov 16:48
@jgm jgm
Compare
Choose a tag to compare
  • List pdf in --list-output-formats.
  • EPUB writer: Fix regression with --css (#5937). In 2.8 --css would not have an effect on EPUB output.
  • RST writer: Use grid tables for one-column tables, since simple tables clash with heading syntax in this case (#5936).
  • Add unexported module Text.Pandoc.Readers.Metadata (see #5914).
  • Use doctemplates 0.7.2, which adds the nowrap filter to templates.
  • Update default man template using nowrap for .TH heading (#5929).
  • HTML templates: Add support for toc-title variable (#5930, Alexandre Franke).
  • Remove grffile (LaTeX package) requirement in MANUAL.txt (#5927, Ian Max Andolina).
  • Use skylighting 0.8.3.

pandoc 2.8

22 Nov 17:46
@jgm jgm
Compare
Choose a tag to compare
  • Improvements in templates system (from doctemplates):

    • Pandoc templates now support a number of new features that have been added in doctemplates: notably, elseif, it, partials, filters, and syntax to control nesting and reflowing of text. These changes make pandoc more suitable out of the box for generating plain-text documents from data in YAML metadata. It can create enumerated lists and even tabular structures.
    • We now used templates parameterized on doclayout Doc types. The main impact of this change is better reflowing of content interpolated into templates. Previously, interpolated variables were rendered independently and intepolated as strings, which could lead to overly long lines. Now the templates interpolated as Doc values which may include breaking spaces, and reflowing occurs after template interpolation rather than before.
    • Remove code from the LaTeX, Docbook, and JATS writers that looked in the template for strings to determine whether it is a book or an article, or whether csquotes is used. This was always kludgy and unreliable.
    • Change template code to use new API for doctemplates.
  • Add --defaults/-d option. This adds the ability to specify a collection of default values for options in a YAML file. For example, one might define a set of defaults for letters, and then do pandoc -d letter myletter.md -o myletter.pdf. See the documentation of this feature in MANUAL.txt.

  • Raise error on unsupported extensions (#4338).

  • The --list-extensions[=FORMAT] option now lists only extensions that affect the given FORMAT.

  • Add -L option as shortcut for --lua-filter.

  • Add --shift-heading-level-by option and deprecate --base-heading-level (#5615). The new option does everything the old one does, but also allows negative shifts. It also promotes the document metadata (if not null) to a level-1 heading with a +1 shift, and demotes an initial level-1 heading to document metadata with a -1 shift. This supports converting documents that use an initial level-1 heading for the document title.

  • Allow --metadata-file to be used repeatedly to include multiple metadata files (Owen McGrath, #5702). Values in files specified first will be overridden by those in later files.

  • --ascii now uses numerical hex character references (#5718).

  • Allow PDF output to stdout (#5751). PDF output now behaves like other binary formats: it will not be output to the terminal, but can be sent to stdout using either -o - or a pipe. The intermediate format will be determined based on the setting of --pdf-engine.

  • Make some writers sensitive to ‘unlisted’ class on headings (#1762). If this is present on a heading with the ‘unnumbered’ class, the heading won’t appear in the TOC. This class has no effect if ‘unnumbered’ is not also specified. This affects HTML-based writers (including slide shows and EPUB), LateX (including beamer), RTF, and PowerPoint. Other writers do not yet support unlisted.

  • Fix gfm_auto_identifiers behavior with emojis (#5813). Note that we also now use emoji names for emojis when ascii_identifiers is enabled.

  • When --ipynb-output is used with the default “best” format, strip ANSI escape codes for non-ipynb output (#5633). These cause problems in many formats, including LaTeX.

  • Don’t look for template files remotely for remote input (#5579). Previously pandoc would look for the template at a remote URL when a URL was used for the input file, instead of taking it from the data directory.

  • Allow combining -Vheader-includes and --include-in-header (#5904). Previously header-includes set as a variable would be clobbered by material included using --include-in-header.

  • Change merge behavior for metadata. Previously, if a document contained two YAML metadata blocks that set the same field, the conflict would be resolved in favor of the first. Now it is resolved in favor of the second (due to a change in pandoc-types). This makes the behavior more uniform with other things in pandoc (such as reference links and --metadata-file).

  • Don’t add a newline to fragment output if there’s already one.

  • Change exit codes and document in MANUAL.txt:

    • PandocAppError was 1, is now 4
    • PandocOptionError was 2, is now 6
    • PandocMakePDFError was 65, is now 66
  • Switch to new pandoc-types and use Text instead of String [API change]. (Christian Despres, #5884).

  • HTML reader:

    • Better handling of <q> with cite attribute (#5798, Ole Martin Ruud). If a <q> tag has a cite attribute, we interpret it as a Quoted element with an inner Span.
    • Add support for HTML <samp> element (#5792, Amogh Rathore). The <samp> element is parsed as Code with class sample.
    • Add support for HTML <var> element (#5799, Amogh Rathore). The <var> element is parsed as Code with class variable.
    • Add support for <mark> elements (Florian B, #5797). Parse <mark> elements from HTML as Spans with class mark.
    • Add support for <kbd> elements, parsing them as Span with class kbd (Daniele D’Orazio, #5796).
    • Add support for <dfn>, parsing this as a Span with class dfn (#5882, Florian Beeres).
  • Markdown reader:

    • Headers: don’t parse content over newline boundary (#5714).
    • Handle inline code more eagerly within lists (Brian Leung, #5627).
    • Removed some needless lookaheads.
    • Don’t parse footnote body unless extension enabled.
    • Fix small super/subscript issue (#5878). Superscripts and subscripts cannot contain spaces, but newlines were previously allowed (unintentionally). This led to bad interactions in some cases with footnotes. With this change newlines are also not allowed inside super/subscripts.
    • Use take1WhileP for str, table row. This yields a small but measurable performance improvement.
  • LaTeX reader:

    • Fix parsing of optional arguments that contain braced text (#5740).
    • Don’t try to parse includes if raw_tex is set (#5673). When the raw_tex extension is set, we just carry through \usepackage, \input, etc. verbatim as raw LaTeX.
    • Properly handle optional arguments for macros (#5682).
    • Fix \\ in \parbox inside a table cell (#5711).
    • Improve withRaw so it can handle cases where the token string is modified by a parser (e.g. accent when it only takes part of a Word token) (#5686). This fixes a bug that caused the ends of certain documents to be dropped.
    • Handle \passthrough macro used by latex writer (#5659).
    • Support tex \tt command (#5654).
    • Search for image with list of extensions like latex does, if an extension is not provided (#4933).
    • Handle \looseness command values better (#4439).
    • Add mbox and hbox handling (Vasily Alferov, #5586). When +raw_tex is enabled, these are passed through literally. Otherwise, they are handled in a way that emulates LaTeX’s behavior.
    • Properly handle \providecommand and \provideenvironment (#5635). They are now ignored if the corresponding command or environment is already defined.
    • Support epigraph command in LaTeX Reader (oquechy, #3523).
    • Ensure that expanded macros in raw LaTeX end with a space if the original did (#4442).
    • Treat ly environment from lilypond as verbatim (Urs Liska, #5671).
    • Add tikzcd to list of special environments (Eigil Rischel). This allows it to be processed by filters, in the same way that one can do for tikzpicture.
  • Roff reader:

    • Better support for while.
    • More improvements in parsing conditionals.
    • Fix problem parsing comments before macro.
    • Improve handling of groups.
    • Better parsing of groups (#5410). We now allow groups where the closing \\} isn’t at the beginning of a line.
  • RST reader:

    • Keep name property in imgAttr (Brian Leung, #5619).
    • Fixed parsing of indented blocks (#5753). We were requiring consistent indentation, but this isn’t required by RST.
    • Use title, not admonition-title, for admonition title. This puts RST reader into alignment with docbook reader.
    • Don’t strip final underscore from absolute URI (#5763).
    • Avoid spurious warning when resolving links to internal anchors ending with _ (#5763).
  • Org reader:

    • Accept ATTR_LATEX in block attributes (Albert Krewinkel, #5648). Attributes for LaTeX output are accepted as valid block attributes; however, their values are ignored.
    • Modify handling of example blocks (Brian Leung, #5717).
    • Allow the -i switch to ignore leading spaces (Brian Leung).
    • Handle awkwardly-aligned code blocks within lists (Brian Leung). Code blocks in Org lists must have their #+BEGIN_ aligned in a reasonable way, but their other components can be positioned otherwise.
    • Fix parsing of empty comment lines (#5856, Albert Krewinkel). Comment lines in Org-mode can be completely empty.
  • Muse reader (Alexander Krotov):

    • Add RTL support (#5551).
    • Do not allow closing asterisks to be followed by *.
    • Do not split series of asterisks into symbols and emphasis (#5821).
    • Do not terminate emphasis on * not followed by space.
  • DokuWiki reader:

    • Parse markup inside monospace (’’) (#5916, Alexander Krotov).
  • Docx reader:

    • Move style-parsing-specific code to a new unexported module, Text.Pandoc.Readers.Docx.Parse.Styles (Nikolay Yakimov).
    • Move StyleMap to docx writer (Nikolay Yakimov).
    • Only use LTR when it is overriding BiDi setting (#5723, Jesse Rosenthal). The left-to-right direction setting in docx is used in the spec only for overriding an explicit right-to-left setting. We only process ...
Read more

pandoc 2.7.3

12 Jun 06:07
@jgm jgm
Compare
Choose a tag to compare
  • Add jira (Atlassian’s Jira wiki markup) as output format (#2497, Albert Krewinkel).

  • Add tex_math_dollars to multimarkdownExtensions (#5512). This form is now supported in multimarkdown, in addition to tex_math_double_backslash.

  • Fix --self-contained so it works when output format has extensions. Previously if you used --self-contained with html-smart or html+smart, it wouldn’t work.

  • Add template variable curdir with working directory from which pandoc is run (#5464).

  • Markdown reader: don’t create implicit reference for empty header (#5549).

  • Muse reader: allow images inside link descriptions (Alexander Krotov).

  • HTML reader: epub related fixes.

    • With epub extensions, check for epub:type in addition to type.
    • Fix problem with noteref parsing which caused block-level content to be eaten with the noteref.
    • Rename pAnyTag to pAny.
    • Refactor note resolution.
    • Trim definition list terms (Alexander Krotov).
  • LaTeX reader:

    • Add braces when resolving \DeclareMathOperator (#5441). These seem to be needed for xelatex but not pdflatex.
    • Allow newlines in \mintinline.
    • Pass through unknown listings language as class (#5540). Previously if the language was not in the list of languages supported by listings, it would not be added as a class, so highlighting would not be triggered.
    • rawLaTeXInline: Include trailing {}s in raw latex commands (#5439). This change affects the markdown reader and other readers that allow raw LaTeX. Previously, trailing {} would be included for unknown commands, but not for known commands. However, they are sometimes used to avoid a trailing space after the command. The chances that a {} after a LaTeX command is not part of the command are very small.
  • MediaWiki reader: handle multiple attributes in table row (#5471, chinapedia).

  • Docx reader: Add support for w:rtl (#5545). Elements with this property are put into Span inlines with dir="rtl".

  • DocBook reader: Issue IgnoredElement warnings.

  • Org reader (Albert Krewinkel):

    • Fix planning elements in headers level 3 and higher (#5494). Planning info is now always placed before the subtree contents. Previously, the planning info was placed after the content if the header’s subtree was converted to a list, which happens with headers of level 3 and higher per default.
    • Omit, but warn about unknown export options. Unknown export options are properly ignored and omitted from the output.
    • Prefer plain symbols over math symbols (#5483). Symbols like \alpha are output plain and unemphasized, not as math.
    • Recognize emphasis after TODO/DONE keyword (#5484).
  • FB2 reader:

    • Skip unknown elements rather than throwing errors (#5560). Sometimes custom elements are used (e.g. id element inside author); previously the reader would halt with an error. Now it skips the element and issues an IgnoredElement warning.
    • Parse notes (#5493, Alexander Krotov).
    • Internal improvements (Alexander Krotov).
  • OpenDocument writer: Roll back automatic figure/table numbering (#5474). This was added in pandoc 2.7.2, but it makes it impossible to use pandoc-crossref. So this has been rolled back for now, until we find a good solution to make this behavior optional (or a creative way to let pandoc-crossref and this feature to coexist).

  • New module Text.Pandoc.Writers.Jira, exporting writeJira [API change] (Albert Krewinkel).

  • EPUB writer:

    • Don’t include ‘landmarks’ if there aren’t any. Previously we could get an empty ol element, which caused validation errors with epubcheck.
    • Ensure unique ids for styleesheets in content.opf (#5463).
    • Make stylesheet link compatible with kindlegen (#5466, Eric Schrijver). Pandoc omitted type="text/css" from both <style> and <rel="stylesheet"> elements in all templates, which is valid according to the spec. However, Amazon’s kindlegen software relies on this attribute on <link> elements when detecting stylesheets to include.
  • HTML writer:

    • Output video and audio elements depending on file extension of the image path (Mauro Bieg).
    • Emit empty alt tag in figures (#5518, Mauro Bieg). The same text is already in the and screen-readers would read it twice, see #4737.
    • Don’t add variation selector if it’s already there. This fixes round-trip failures.
    • Prevent gratuitious emojification on iOS (#5469). iOS chooses to render a number of Unicode entities, including ‘↩’, as big colorful emoji. This can be defeated by appending Unicode VARIATION SELECTOR-15’/‘VARIATION SELECTOR-16’. So we now append this character when escaping strings, for both ‘↩’ and ‘↔’. If other characters prove problematic, they can simply be added to needsVariationSelector.
    • Add class="heading" to level 7+ Headers rendered as <p> elements (#5457).
  • RST writer: treat Span with no attributes as transparent (#5446). Previously an Emph inside a Span was being treated as nested markup and ignored. With this patch, the Span is just ignored.

  • LaTeX writer:

    • Include inline code attributes with --listings (#5420).
    • Don’t produce columns environment unless beamer (#5485).
    • Fix footnote in image caption. Regression: the fix for #4683 broke this case.
    • Don’t highlight code in headings (#5574). This causes compilation errors.
    • Use \mbox to get proper behavior inside \sout (#5529).
  • EPUB writer: Fix document section assignments (#5546). For example, introduction should go in bodymatter, not frontmatter, and epigraph, conclusion, and afterward should go in bodymatter, not backmatter. For the full list of assignments, see the manual.

  • Markdown writer:

    • Add backslashes to avoid unwanted interpretation of definition list terms as other kinds of block (#554).
    • Ensure the code fence is long enough (#5519). Previously too few backticks were used when the code block contained an indented line of backticks. (Ditto tildes.)
    • Handle labels with integer names (Jesse Rosenthal, #5495). Previously if labels had integer names, it could produce a conflict with auto-labeled reference links. Now we test for a conflict and find the next available integer. This involves adding a new state variable stPrevRefs to keep track of refs used in other document parts when using --reference-location=block|section
  • Textile writer: fix closing tag for math output (Albert Krewinkel). Opening and closing tag for math output match now.

  • Org writer: always indent src blocks content by 2 spaces (#5440, Albert Krewinkel). Emacs always uses two spaces when indenting the content of src blocks, e.g., when exiting a C-c ' edit-buffer. Pandoc used to indent contents by the space-equivalent of one tab, but now always uses two spaces, too.

  • Asciidoc writer:

    • Use `+...+` form for inline code. The old `a__b__c` yields emphasis inside code in asciidoc. To get a pure literal code span, use `+a__b__c+`.
    • Use proper smart quotes with asciidoctor (#5487). Asciidoctor has a different format for smart quotes.
    • Use doubled ## when necessary for spans (#5566).
    • Ensure correct nesting of strong/emph (#5565): strong must be the outer element.
  • JATS writer:

    • Wrap elements with p when needed (#5570). The JATS spec restricts what elements can go inside fn and list-item. So we wrap other elements inside <p specific-use="wrapper"> when needed.
    • Properly handle footnotes (#5511) according to “best practice.” (Group them at the end in <fn-group> and use <xref> elements to link them.)
    • Fix citations with PMID so they validate (#5481). This includes an update to data/jats.csl.
    • Ensure validity of <pub-date> by parsing the date and extracting year, month, and day, as expected. Also add an iso-8601-date attribute automatically.
    • Don’t use <break> element for LineBreak. It is only allowed in a few special contexts, and not in <p> elements.
    • Don’t make <string-name> a child of <string>, which is illegal.
  • FB2 writer:

    • Do not wrap note references into <sup> and brackets (Alexander Krotov). Existing FB2 readers, such as FBReader, already display links with type=“note” as a superscript.
    • Use genre metadata field (#5478).
  • Muse writer: do not escape empty line after <br> (Alexander Krotov).

  • Add unicode code point in “Missing character” warning (#5538). If the character isn’t in the console font, the message is pretty useless, so we show the code point for anything non-ASCII.

  • Lua: add Version type to simplify comparisons (Albert Krewinkel). Version specifiers like PANDOC_VERSION and PANDOC_API_VERSION are turned into Version objects. The objects simplify version-appropriate comparisons while maintaining backward-compatibility. A function pandoc.types.Version is added as part of the newly introduced module pandoc.types, allowing users to create version objects in scripts.

  • pandoc lua module (Albert Krewinkel):

    • Fix deletion of nonexistent attributes (#5569).
    • Better tests for Attr and AttributeList.
  • pandoc.mediabag lua module (Albert Krewinkel):

    • Add function delete for deleting a single item.
    • Add function empty for removing all entries.
    • Add function items for iterating over mediabag.
  • Text.Pandoc.Class: Fix handling of file: URL scheme in downloadOrRead (#5517, Mauro Bieg). Previously file:/ URLs were handled wrongly and pandoc attempted to make HTTP requests, which failed.

  • Text.Pandoc.MIME: add mediaCategory [API change] (Mauro Bieg).
    ...

Read more

pandoc 2.7.2

06 Apr 05:04
@jgm jgm
Compare
Choose a tag to compare
  • Add XWiki writer (#1800, Derek Chen-Becker). Add Text.Pandoc.Writers.XWiki, exporting writeXWiki [API change].

  • Dokuwiki Reader: parse single curly brace (#5416, Mauro Bieg).

  • Vimwiki reader: improve handling of internal links (#5414). We no longer append .html to link targets, and we add a title wikilink. This mirrors behavior of other wiki readers. Generally the .html extension is not wanted. It may be important for output to HTML in certain circumstances, but it can always be added using a filter that matches on links with title wikilink.

    If your workflow requires the current behavior, here is a lua filter that will add the .html extension:

    function Link(el)
      if el.title == 'wikilink' then
        el.target = el.target .. ".html"
      end
      return el
    end
  • ipynb reader:

    • Use format ipynb for raw cell where no format given.
    • Avoid introducing spurious .0 on integers in metadata.
  • Markdown reader: fenced div takes priority over setext header.

  • HTML reader: read data-foo attribute into foo (#5392). The HTML writer adds the data- prefix for HTML5 for nonstandard attributes. But the attributes are represented in the AST without the data- prefix, so we should strip this when reading HTML.

  • LaTeX reader: Improve autolink detection (#5340).

  • PowerPoint writer (Jesse Rosenthal):

    • Expand builtin reference doc to model all layouts. The previous built-in reference doc had only title and content layouts. Add in a section-header slide and a two-content slide, so users can more easily modify it to build their own templates.
    • Always open up in slide view. When editing a template/reference-doc, the user might be in Master view, but when producing a slide show, it is assumed that slide view will be desired.
    • Remove handoutsMasterList from template presentation.xml
    • Fix numerous errors in templating (#5402). Previously, some templates produced by Office 365 (MacOS) would not render with --reference-doc correctly. We now apply correct shapes for content, and build shape trees correctly.
    • Make default placeholder type for template lookup.
    • Apply speaker notes to metadata slide if applicable.
    • Test for speaker notes after breaking header.
    • Correctly handle notes after section-title header. Previously, if notes came after a section-title header (ie, a level-1 header in a slide-level=2 presentation), they would go on the next slide. This keeps them on the slide with the header.
    • Internal improvements.
  • ipynb writer:

    • Use format ipynb for raw cell where no format given. According to nbformat docs, this is supposed to render in every format. We don’t do that, but we at least preserve it as a raw block in markdown, so you can round-trip.
    • Consolidate adjacent raw blocks. Sometimes pandoc creates two HTML blocks, e.g. one for the open tag and one for a close tag. If these aren’t consolidated, only one will show up in output cell.
    • Fixed carry-over of nbformat from metadata.
    • Preserve nbformat_minor if it’s given. This helps with round-tripping.
  • LaTeX writer:

    • Avoid inadvertently creating ?or ! ligatures (#5407). These are upside down ? and !, resp.
    • Fix footnotes in table caption and cells (#5367). This fixes a bug wherein footnotes appeared in the wrong order, and with duplicate numbers, when in table captions and cells. We now use regular \footnote commands, even in the table caption and the minipages containing cells. Apparently longtable knows how to handle this.
  • HTML writer: Don’t add data- prefix to RDFa attributes (#5403).

  • JATS writer: Ensure that plain strings go inside <pub-id> tag (#5397).

  • Markdown writer:

    • Better rendering of numbers (#5398). If the number is integral, we render it as an integral not a float.
    • Proper rendering of empty map in YAML metadata (#5398). Should be {}, not empty string.
    • Properly escape attributes in Markdown writer (#5369).
    • Be sure implicit figures work in list contexts (#5368). Previously they would sometimes not work: e.g., when they occured in final paragraphs in lists that were originally parsed as Plain and converted later using PlainToPara.
  • Docx writer: Use w:br without attributes for line breaks (#5377). We previously added the attribute type="textWrapping", but this causes problems on Word Online.

  • LaTeX template (Andrew Dunning):

    • Ensure correct heading/table order (#5365). Improve workaround (#1658) for tables following headings. The new solution works whether or not the indent variable is enabled.
    • Remove subparagraph variable. The default is now to use run-in style for level 4 and 5 headings (\paragraph and \subparagraph). To get the previous default behavior (where these were formatted as blocks, like \subsubsection), set the block-headings variable.
    • Add pandoc to PDF metadata (#5388).
    • Group graphics-related code (#5389).
    • Move \setstretch after front matter (#5179). Ensures that \maketitle, \tableofcontents, and so forth are not affected by changes to line spacing.
  • Update data/jats.csl to avoid commas between name-part elements (#5397).

  • Add support for golang (go) with --listings (#5427).

  • Text.Pandoc.Shared - improve metaToJSON behavior with numbers. We now do a better job marshalling numbers from MetaString or MetaInlines into JSON Number.

  • Text.Pandoc.Writers.Shared: metaValueToJSON: use Number Values for integers. Pandoc’s MetaValue doesn’t have a distinguished number type, so numbers are put in MetaStrings. If the MetaString consists entirely of digits, we convert it to a Number. We should probably consider adding a MetaNumber constructor to MetaValue, for better round-tripping with JSON etc. This change aids round-tripping in ipynb metadata fields, like toc_depth.

  • Text.Pandoc.Class: fetchItem: don’t treat UNC paths as protocol-relative URLs (#5127). These are paths beginning //?/UNC/....

  • Text.Pandoc.ImageSize: Improve pdfSize so it handles a wider range of PDFs (#4322, with help from Richard Davis).

  • Text.Pandoc.Pretty: avoid stack overflow by using strict sum (#5401).

  • Fix harmless error in file-scope code (#5422).

  • MANUAL.txt:

    • Improve ‘header’ and ‘heading’ usage (#5423, Andrew Dunning). The term ‘header’ was being used where ‘heading’ is more appropriate.
    • Add paragraph on options affecting markdown in ipynb.
  • stack.yaml - remove -Wmissing-home-modules This seems to cause problems with stack ghci. Remove RTS options.

  • Add ghc-options to cabal.project.

  • appveyor.yml - use ghc 8.6.4. Fixes segfault issues on Windows (#5037).

  • linux build process: Remove clone of pandoc-citeproc (#5366). It wasn’t being used; cabal.project specifies the version to use.

pandoc 2.7.1

14 Mar 16:30
@jgm jgm
Compare
Choose a tag to compare
  • Add tectonic as an option for –pdf-engine (#5345, Cormac Relf). Runs tectonic on STDIN instead of a temporary .tex file, so that it looks in the working directory for \include and \input like the rest of the engines. Allows overriding the output directory args with --pdf-engine-opt=--outdir --pdf-engine-opt="$DIR".

  • Allow -o/--output to be used with --print-default-data-file, --print-highlighting-style, --print-default-template. Note that -o must occur BEFORE the --print* command on the command line (this is documented, #5357).

  • LaTeX reader:

    • Support \underline, \ul, \uline (#5359, Paul Tilley). These are parsed as a Span with class underline, as with other readers.
    • Ensure that \Footcite and \Footcites get put in a note.
  • ipynb reader:

    • Remove sensitivity to raw_html, raw_tex extensions. We now include every output format. Pruning is handled by --ipynb-output.
    • Better handling of cell metadata. We now include even complex cell metadata in the Div’s attributes (as JSON, in complex cases, or as plain strings in simple cases).
  • ipynb writer:

    • Recurse into native divs for output cell data (#5354).
    • Render cell metadata fields from div attributes.
  • Docx writer: avoid extra copy of abstractNum and num elements in numbering.xml. This caused pandoc-produced docx files to be uneditable using Word Online (#5358).

  • Markdown writer: improve handling of raw blocks/inline. We now emit raw content using raw_attribute when no more direct method is available. Use of raw_attribute can be forced by disabling raw_html and raw_tex.

  • LaTeX writer: Add classes for frontmatter support (#5353, Andrew Dunning) and remove frontmatter from scrreprt.

  • LaTeX template:

    • Improve readability (#5363, Andrew Dunning).
    • Robust section numbering removal (#5351, Andrew Dunning). Ensures that section numbering does not reappear with custom section levels. See https://tex.stackexchange.com/questions/473653/.
    • Better handling of front/main/backmatter (#5348). In pandoc 2.7 we assumed that every class with chapters would accept \frontmatter, \mainmatter, and \backmatter. This is not so (e.g. report does not). So pandoc 2.7 breaks on report class by including an unsupported command. Instead of the book-class variable, we use two variables, has-chapters and has-frontmatter, and set these intelligently in the writer.
  • Text.Pandoc.Shared: Improve filterIpynbOutput. Ensure that images are prioritized over text. best should include everything for ipynb.

  • Tests.Old: specify --data-dir=../data to ensure tests can find data files even if they haven’t been installed. Remove old pandoc_datadir environment variable, which hasn’t done anything for a long time.

  • MANUAL.txt: Add recommendation to use raw_attribute with ipynb (#5354).

  • Use cmark-gfm-hs 0.1.8 (note that 0.1.7 is buggy).

  • Use latest pandoc-citeproc, texmath.

pandoc 2.7

03 Mar 19:50
@jgm jgm
Compare
Choose a tag to compare
  • Use XDG data directory for user data directory (#3582). Instead of $HOME/.pandoc, the default user data directory is now $XDG_DATA_HOME/pandoc, where XDG_DATA_HOME defaults to $HOME/.local/share but can be overridden by setting the environment variable. If this directory is missing, then $HOME/.pandoc is searched instead, for backwards compatibility. However, we recommend moving local pandoc data files from $HOME/.pandoc to $HOME/.local/share/pandoc. On Windows the default user data directory remains the same.

  • Slide show formats behavior change: content under headers less than slide level is no longer ignored, but included in the title slide (for HTML slide shows) or in a slide after the title slide (for beamer). This change makes possible 2D reveal.js slideshows with content in the top slide on each stack (#4317, #5237).

  • Add command line option --ipynb-output=all|none|best (#5339). Output cells in ipynb notebooks often contain several different versions of an output, with different MIME types, e.g. an HTML table and a plain-text fallback. Specifying --ipynb-output=best (the default) ensures that the best version for the output format is used. all includes all versions, and none suppresses them all, leaving output cells empty.

  • asciidoctor is now an output format separate from asciidoc, to accommodate some minor implementation-specific differences (currently just in the treatment of display math).

  • Add latexmk as an option for --pdf-engine (#3195). Note that you can use --pdf-engine-opt=-outdir=bar to specify a persistent temp directory.

  • Markdown reader:

    • Improve tight/loose list handling (#5285). Previously the algorithm allowed list items with a mix of Para and Plain, which is never wanted.
    • Add newline when parsing blocks in YAML (#5271). Otherwise last block gets parsed as a Plain rather than a Para. This is a regression in pandoc 2.x. This patch restores pandoc 1.19 behavior.
    • Make yamlToMeta respect extensions (#5272, Mauro Bieg). This adds a ReaderOptions parameter to yamlToMeta [API change].
    • Fix bug parsing fenced code blocks (#5304). Previously parsing would break if the code block contained a string of backticks of sufficient length followed by something other than end of line.
  • LaTeX reader: don’t let \egroup match {. braced now actually requires nested braces. Otherwise some legitimate command and environment definitions can break.

  • Docx reader (Jesse Rosenthal):

    • Rename getDocumentPath as getDocumentXmlPath.
    • Use field notation for setting ReaderEnv.
    • Figure out document.xml path once at the beginning of parsing, and add it to the environment, so we can avoid repeated lookups.
    • Dynamically determine main document xml path (#5277). The desktop Word program places the main document file in word/document.xml, but the online word places it in word/document2.xml. This file path is actually stated in the root _rels/.rels file, in the Relationship element with an http://../officedocument type.
    • Fix paths in archive to prevent Windows failure (#5277). Some paths in archives are absolute (have an opening slash) which, for reasons unknown, produces a failure in the test suite on MS Windows. This fixes that by removing the leading slash if it exists.
    • Add comments to aid code readability.
    • Trim space inside the last inline (#5273).
    • Unwrap sdt elements in footnotes and comments (#5302).
  • Muse reader (Alexander Krotov):

    • Test that block level markup does not break <verbatim>.
    • Add secondary note support.
  • ipynb reader: handle images referring to attachments. Previously we didn’t strip off the attachment: prefix, so even though the attachment was available in the mediabag, pandoc couldn’t find it.

  • JATS reader:

    • Fix parsing of figures (#5321). This ensures that a figure containing a single image is parsed as a pandoc “implicit figure” (i.e., a Para with a single Image whose title attribute begins with fig:). More complex figures will still be parsed as divs.
    • Support fig-group block element (#5317).
    • Handle citations with multiple references (#5310). The rid attribute can have a space-separated list of ids.
  • AsciiDoc Writer: Add writeAsciiDoctor [API change, Tarik Graba]. Handle display math appropriately for Asciidoctor.

  • JATS writer: wrap figure caption in <p> to fix validation (#5290, Mauro Bieg).

  • HTML writer:

    • Implement WAI-ARIA roles for (end)notes, citations, and bibliography (#4213). Note that doc-biblioref is only used when link-citations produces links, since it belongs on links.
    • Include content (including speaker notes) in title slides (#4317, #5237).
  • ipynb writer:

    • Ensure final newline.
    • Only include metadata under jupyter field.
    • Don’t create attachments for images with absolute URIs, including data: URIs (#5303).
    • Keep plain text fallbacks in output even if a richer format is included (#5293). We don’t know what output format will be needed. See the --ipynb-output command line option for a way to control what formats are included in the output.
  • Markdown writer: use markdown="1" when appropriate for Divs: when native_divs and markdown_in_html_blocks are disabled but raw_html and markdown_attribute are enabled.

  • LaTeX writer:

    • Use right fold for escapeString. This is more elegant than the explicit recursive code we were using.
    • Avoid {} after control sequences when escaping. \ldots{}. doesn’t behave as well as \ldots. with the latex ellipsis package. This patch causes pandoc to avoid emitting the {} when it is not necessary. Now \ldots and other control sequences used in escaping will be followed by either a {}, a space, or nothing, depending on context.
    • For beamer, include contents under headers superordinate to slidelevel (#4317). Currently we keep the fancy title slide, and add a new slide with the same title and whatever content was under the header.
  • Powerpoint writer (Jesse Rosenthal): support underlines. Use span with single class “underline” as in docx writer.

  • Muse writer: escape secondary notes (Alexander Krotov).

  • FB2 writer: add section identifiers support (#5229, John KetzerX).

  • Make --fail-if-warnings work for PDF output (#5343).

  • Lua filters (Albert Krewinkel):

    • Load module pandoc before calling init.lua (#5287). The file init.lua in pandoc’s data directory is run as part of pandoc’s Lua initialization process. Previously, the pandoc module was loaded in init.lua, and the structure for marshaling was set up after. This allowed simple patching of element marshaling, but made using init.lua more difficult. Now, all required modules are now loaded before calling init.lua. The file can be used entirely for user customization. Patching marshaling functions, while discouraged, is still possible via the debug module.
    • All Lua modules bundled with pandoc, i.e., pandoc.List, pandoc.mediabag, pandoc.utils, and text are re-exported from the pandoc module. They are assigned to the fields List, mediabag, utils, and text, respectively.
  • Text.Pandoc.Lua (Albert Krewinkel):

    • Split StackInstances into smaller Marshaling modules.
    • Get CommonState from Lua global. This allows more control over the common state from within Lua scripts.
  • LaTeX template:

    • Support the subject metadata variable (#5289, Pascal Wagler).
    • Add \frontmatter, \mainmatter, \backmatter for book classes (#5306).
  • epub3 template: Add titlepage class to section (#5269).

  • HTML5 template: Add ARIA role doc-toc for table of contents (#4213).

  • Make –metadata-file use pandoc-markdown (#5279, #5272, Mauro Bieg).

  • Text.Pandoc.Shared:

    • Remove withTempDir [API change].
    • Add new exported function defaultUserDataDirs [API change].
    • Add filterIpynbOutput [API change].
    • compactify: Avoid lists with a mix of Plain and Para elements (#5285).
  • Text.Pandoc.Translations: reorder alphabetically and remove Author (#5334, Mauro Bieg).

  • Text.Pandoc.Extensions:

    • More carefully groom ipynb default extensions.
    • Add all_symbols_escapable to githubMarkdownExtensions.
  • Text.Pandoc.PDF:

    • Use system temp directory when possible (#1192). Previously we created temp dirs in the working directory, partly (a) because there were problems using the system temp directory on Windows, when their pathnames included tildes, and partly (b) because programs like epstopdf.pl would not be allowed to write to directories outside the working directory in restricted mode. We now (a) use the system temp dir except when the path includes tildes, and (b) set TEXMFOUTPUT when creating the PDF, so that subsidiary programs can use the system temp directory. This addresses problems that occurred when pandoc was used in a synced directory (such as Dropbox).
    • Change types of subsidiary functions to PandocIO, to allow warnings to be threaded through (#5343).
  • Text.Pandoc.MIME: add WebP (#5267, Mauro Bieg).

  • Tests: avoid calling findPandoc multiple times.

  • Old tests: remove need for temp files by using pipeProcess.

  • Added simple ipynb reader/writer tests (#5274).

  • Rearrange --help output in a more rational way, with common options at the beginning and options grouped by function (#5336).

  • trypandoc: Add JATS and other missing formats (Arfon Smith, #5291).

  • Add missing copyright notices and remove license boilerplate (#4592, Albert Krewinkel).

  • Use latest basement/foundation on 32bi...

Read more

pandoc 2.6

31 Jan 07:53
@jgm jgm
Compare
Choose a tag to compare
  • Support ipynb (Jupyter notebook) as input and output format.

    • Add ipynb as input and output format (extension .ipynb).
    • Added Text.Pandoc.Readers.Ipynb [API change].
    • Added Text.Pandoc.Writers.Ipynb [API change].
    • Add PandocIpynbDecodingError constructor to Text.Pandoc.Error.Error [API change].
    • Depend on ipynb library.
    • Note: there is no template for ipynb.
  • Add DokuWiki reader (#1792, Alexander Krotov). This adds Text.Pandoc.Readers.DokuWiki [API change], and adds dokuwiki as an input format.

  • Implement task lists (#3051, Mauro Bieg). Added task_lists extension. Task lists are supported from markdown and gfm input. They should work, to some degree, in all output formats, though in most formats you’ll get a bullet list with a unicode character for the box. In HTML, you get checkboxes and in LaTeX/PDF output, a box is used as the list marker. API changes:

    • Added constructor Ext_task_lists to Extension.
    • Added taskListItemFromAscii and taskListItemToAscii to Text.Pandoc.Shared.
  • Allow some command line options to take URL in addition to FILE. --include-in-header, --include-before-body, --include-after-body.

  • HTML reader:

    • Handle empty start attribute (see #5162).
    • Treat textarea as a verbatim environment (#5241) and preserve spacing.
  • RST reader:

    • Change treatment of number-lines directive (Brian Leung, #5207). Directives of this type without numeric inputs should not have a startFrom attribute; with a blank value, the writers can produce extra whitespace.
    • Removed superfluous sourceCode class on code blocks (#5047).
    • Handle sourcecode directive as synonynm for code (#5204).
  • Markdown reader:

    • Remove sourceCode class for literate Haskell code blocks (#5047). Reverse order of literate and haskell classes on code blocks when parsing literate Haskell, so haskell is first.
    • Treat <textarea> as a verbatim environment (#5241).
  • Org reader:

    • Handle minlevel option differently (#5190, Brian Leung). When minlevel exceeds the original minimum level observed in the file to be included, every heading should be shifted rightward.
    • Allow for case of :minlevel == 0 (#5190).
    • Fix treatment of links to images (#5191, Albert Krewinkel). Links with descriptions which are pointing to images are no longer parsed as inline images, but as links.
    • Add support for #+SELECT_TAGS (Brian Leung).
    • Separate filtering logic from conversion function (Brian Leung).
  • TWiki reader: Fix performance issue with underscores (#3921).

  • MediaWiki reader: use _ instead of - in auto-identifiers (#4731). We may not still be exactly matching mediawiki’s algorithm.

  • LaTeX reader:

    • Remove sourceCode class for literate Haskell code blocks (#5047). Reverse order of literate and haskell classes on code blocks when parsing literate Haskell, so haskell is first.
    • Support \DeclareMathOperator (#5149).
    • Support \inputminted (#5103).
    • Support \endinput (#5233).
    • Allow includes with dots like cc_by_4.0. Previously the .0 was interpreted as a file extension, leading pandoc not to add .tex (and thus not to find the file). The new behavior matches tex more closely.
  • Man reader:

    • Use mapLeft from Shared instead of defining own.
  • Docx reader (Jesse Rosenthal):

    • Handle level overrides (#5134).
  • Docx writer:

    • Support custom properties (#3024, #5252, Agustín Martín Barbero). Also supports additional core properties: subject, lang, category, description.
    • Make Level into a real type, instead of an alias for a tuple (Jesse Rosenthal).
  • ICML writer (Mauro Bieg):

    • Support custom-styles (#5137, see #2106).
    • Support unnumbered headers (#5140).
  • Texinfo writer: Use header identifier for anchor if present (#4731). Previously we were overwriting an existing identifier with a new one.

  • Org writer: Preserve line-numbering for example and code blocks (Brian Leung).

  • Man/Ms writers: Don’t escape - as \-. The \- gets rendered in HTML and PDF as a unicode minus sign.

  • Ms writer: Ensure we have a newline after .EN in disply math (#5251).

  • RST writer: Don’t wrap simple table header lines (#5128).

  • Asciidoc writer: Shorter delimiters for tables, blockquotes (#4364). This matches asciidoctor reference docs.

  • Dokuwiki writer: Remove automatic : prefix before internal image links (#5183, Damien Clochard). This prevented users from making relative image links.

  • Zimwiki writer: remove automatic colon prefix before internal images (#5183, Damien Clochard).

  • MediaWiki writer: fix caption, use ‘thumb’ instead of ‘frame’ (#5105). Captions used to have the word ‘caption’ prepended; this has been removed. Also, ‘thumb’ is used instead of ‘frame’ to allow images to be resized.

  • reveal.js writer:

    • Ensure that we don’t get > 2 levels of section nesting, even with slide level > 2 (#5168).
    • If slide level == N but there is no N-level header, make sure the next header with level > N gets treated as a slide and put in a section, rather than remaining loose (#5168).
  • Markdown writer:

    • Make plain RawBlocks pass through in plain output.
    • Include needed whitespace after HTML figure (#5121). We use HTML for a figure in markdown dialects that can’t represent it natively.
  • Commonmark writer:

    • Fix handling of SoftBreak with hard_line_breaks (#5195).
    • Implement --toc (writerTableOfContents) in commonmark/gfm writers (#5172).
  • EPUB writer:

    • Ensure that picture transforms are done on metadata too.
    • Small fixes to nav.xhtml: Add ‘landmarks’ id attribute to the landmarks nav. Replace old default CSS removing numbers from ol.toc li with new rules that match nav#toc ol, nav#landmarks ol. We keep the toc class on ol for backwards compatibility.
  • LaTeX writer:

    • Make raw content marked beamer pass through in beamer output (pandoc/lua-filters#40).
    • Beamer: avoid duplicated fragile property in some cases (#5208).
    • Add # special characters for listings (#4939). This character needs special handling in \lstinline.
  • RTF writer: use toTableOfContents from Shared to replace old duplicated code.

  • Pptx writer:

    • Support custom properties. Also supports additional core properties: subject, category, description (#5252, Agustín Martín Barbero).
    • Use toTableOfContents from Shared to replace old duplicated code.
  • ODT writer (Augustín Martín Barbero):

    • Fix typo in custom properties (#2839).
    • Improve standard properties, including the following core properties: generator (Pandoc/VERSION), description, subject, keywords, initial-creator (from authors), creation-date (actual creation date) (#5252).
  • Custom writers:

    • Allow ‘-’ in filenames for custom lua writers (#5187).
    • sample.lua: add SingleQuoted, DoubleQuoted (#5104).
    • sample.lua: Add a missing > (MichaWiedenmann).
  • reveal.js template: Add zoomKey config (#4249).

  • HTML5 template: Remove unnecessary type=“text/css” on style and link for HTML5 (#5146).

  • LaTeX template (Andrew Dunning, except where noted):

    • Prevent fontspec from scaling mainfont to match the default font, Latin Modern. A main font set to 12pt could previously appear between 11pt to 13pt depending on its design. To return to the earlier rendering, use -V mainfontoptions="Scale=MatchLowercase" (#5212, #5218).
    • Display monospaced fonts without TeX ligatures when using --pdf-engine=lualatex. It now matches the behaviour of other engines (#5212, #5218).
    • Remove the deprecated romanfont variable. The functionality of mainfont is identical (#5218).
    • Render \subtitle with the standard document classes. Previously, subtitle only appeared when using the KOMA-Script classes or Beamer (#5213, #5244).
    • Use Babel instead of Polyglossia for LuaLaTeX. This avoids several language selection problems, notably with retaining French spacing conventions when switching to a verbatim environment or another language; and in printing Greek text without hyphenation (#5193).
    • Use the xurl package if available, improving the appearance of URLs by allowing them to break at additional points (#5193).
    • Use bookmark if available to correct heading levels in PDF bookmarks: see the KOMA-Script 3.26 release notes (#5193).
    • Require the xcolor package to avoid a possible error when using additional packages alongside footnotes in tables (#5193, closes #4861).
    • Remove obsolete fixltx2e package, which has no functionality with TeX Live 2015 or later (#5193).
    • Allow multiple fontfamilies.options (#5193, closes #5194).
    • Restrict institute variable to Beamer (#5219).
    • Use footnotehyper package if available to make footnotes in tables compatible with hyperref (#5234).
    • Number parts and chapters in book classes only if the numbersections variable is set, for consistency with other output formats. To return to the previous behaviour, use -V numbersections -V secnumdepth=0 (#5235).
    • Reindent file (#5193).
    • Use built-in parskip handling with KOMA-Script classes (#5143, Enno).
    • Set default listings language for lua, assembler (#5227, John MacFarlane). Otherwise we get an e...
Read more

pandoc 2.5

27 Nov 18:00
@jgm jgm
Compare
Choose a tag to compare
  • Text.Pandoc.App: split into several unexported submodules (Albert Krewinkel): Text.Pandoc.App.FormatHeuristics, Text.Pandoc.App.Opt, Text.Pandoc.App.CommandLineOptions, Text.Pandoc.App.OutputSettings. This is motivated partly by the desire to reduce recompilations when something is modified, since App previously depended on virtually every other module.

  • Text.Pandoc.Extensions

    • Semantically, gfm_auto_identifiers is now a modifier of auto_identifiers; for identifiers to be set, auto_identifiers must be turned on, and then the type of identifier produced depends on gfm_auto_identifiers and ascii_identifiers are set. Accordingly, auto_identifiers is now added to githubMarkdownExtensions (#5057).
    • Remove ascii_identifiers from githubMarkdownExtensions. GitHub doesn’t seem to strip non-ascii characters any more.
  • Text.Pandoc.Lua.Module.Utils (Albert Krewinkel)

    • Test AST object equality via Haskell (#5092). Equality of Lua objects representing pandoc AST elements is tested by unmarshalling the objects and comparing the result in Haskell. A new function equals which performs this test has been added to the pandoc.utils module.
    • Improve stringify. Meta value strings (MetaString) and booleans (MetaBool) are now converted to the literal string and the lowercase boolean name, respectively. Previously, all values of these types were converted to the empty string.
  • Text.Pandoc.Parsing: Remove Functor and Applicative constraints where Monad already exists (Alexander Krotov).

  • Text.Pandoc.Pretty: Don’t render BreakingSpace at end of line or beginning of line (#5050).

  • Text.Pandoc.Readers.Markdown

    • Fix parsing of citations, quotes, and underline emphasis after symbols. Starting with pandoc 2.4, citations, quoted inlines, and underline emphasis were no longer recognized after certain symbols, like parentheses (#5099, #5053).
    • In pandoc 2.4, a soft break after an abbreviation would be relocated before it to allow for insertion of a nonbreaking space after the abbreviation. This behavior is here reverted. A soft break after an abbreviation will remain, and no nonbreaking space will be added. Those who care about this issue should take care not to end lines with an abbreviation, or to insert nonbreaking spaces manually.
  • Text.Pandoc.Readers.FB2: Do not throw error for unknown elements in <body> (Alexander Krotov). Some libraries include custom elements in their FB2 files.

  • Text.Pandoc.Readers.HTML

    • Allow tfoot before body rows (#5079).
    • Parse <small> as a Span with class “small” (#5080).
    • Allow thead containing a row with td rather than th (#5014).
  • Text.Pandoc.Readers.LaTeX

    • Cleaned up handling of dimension arguments. Allow decimal points, preceding space.
    • Don’t allow arguments for verbatim, etc.
    • Allow space before bracketed options.
    • Allow optional arguments after \\ in tables.
    • Improve parsing of \tiny, \scriptsize, etc. Parse as raw, but know that these font changing commands take no arguments.
  • Text.Pandoc.Readers.Muse

    • Trim whitespace before parsing grid table cells (Alexander Krotov).
    • Add grid tables support (Alexander Krotov).
  • Text.Pandoc.Shared

    • For bibliography match Div with id refs, not class references. This was a mismatch between pandoc’s docx, epub, latex, and markdown writers and the behavior of pandoc-citeproc, which actually looks for a div with id refs rather than one with class references.
    • Exactly match GitHub’s identifier generating algorithm (#5057).
    • Add parameter for Extensions to uniqueIdent and inlineListToIdentifier (#5057). [API change] This allows these functions to be sensitive to the settings of Ext_gfm_auto_identifiers and Ext_ascii_identifiers, and allows us to use uniqueIdent in the CommonMark reader, replacing custom code. It also means that gfm_auto_identifiers can now be used in all formats.
  • Text.Pandoc.Writers.AsciiDoc

    • Use .+ as list markers to support nested ordered lists (#5087).
    • Support list number styles (#5089).
    • Render Spans using [#id .class]#contents# (#5080).
  • Text.Pandoc.Writers.CommonMark

    • Respect --ascii (#5043, quasicomputational).
    • Make sure --ascii affects quotes, super/subscript.
  • Text.Pandoc.Writers.Docx

    • Fix bookmarks to headers with long titles (#5091). Word has a 40 character limit for bookmark names. In addition, bookmarks must begin with a letter. Since pandoc’s auto-generated identifiers may not respect these constraints, some internal links did not work. With this change, pandoc uses a bookmark name based on the SHA1 hash of the identifier when the identifier isn’t a legal bookmark name.
    • Add bookmarks to code blocks (Nikolay Yakimov).
    • Add bookmarks to images (Nikolay Yakimov).
    • Refactor common bookmark creation code into a function (Nikolay Yakimov).
  • Text.Pandoc.Writers.EPUB: Handle calibre metadata (#5098). Nodes of the form

      <meta name="calibre:series" content="Classics on War and Politics"/>
    

    are now included from an epub XML metadata file. You can also include this information in your YAML metadata, like so:

      calibre:
       series: Classics on War and Policitics
    

    In addition, ibooks-specific metadata can now be included via an XML file. (Previously, it could only be included via YAML metadata, see #2693.)

  • Text.Pandoc.Writers.HTML: Use plain " instead of &quot; outside of attributes.

  • Text.Pandoc.Writers.ICML: Consolidate adjacent strings, inc. spaces. This avoids splitting up the output unnecessarily into separate elements.

  • Text.Pandoc.Writers.LaTeX: Don’t emit [<+->] unless beamer output, even if writerIncremental is True (#5072).

  • Text.Pandoc.Writers.Muse (Alexander Krotov).

    • Output tables as grid tables if they have multi-line cells.
    • Indent simple tables only on the top level.
    • Output tables with one column as grid tables.
    • Add support for --reference-location.
    • Internal improvements.
  • Text.Pandoc.Writers.OpenDocument: Fix list indentation (Nils Carlson, #5095). This was a regression in pandoc 2.4.

  • Text.Pandoc.Writers.RTF: Fix warnings for skipped raw inlines.

  • Text.Pandoc.Writers.Texinfo: Add blank line before @menu section (#5055).

  • Text.Pandoc.XML: in toHtml5Entities, prefer shorter entities when there are several choices for a particular character.

  • data/abbreviations

    • Add additional abbreviations (Andrew Dunning) Many of these borrowed from the Chicago Manual of Style 10.42, ‘Scholarly abbreviations’.
  • Templates

    • Asciidoc template: add :lang: to title header is lang is set in metadata (#5088).
  • pandoc.cabal: Add cabal flag derive_json_via_th (Albert Krewinkel) Disabling the flag will cause derivation of ToJSON and FromJSON instances via GHC Generics instead of Template Haskell. The flag is enabled by default, as deriving via Generics can be slow (see #4083).

  • trypandoc:

    • Tweaked drop-down lists.
    • Put link to site in footer.
    • Preselect output format.
    • Update on change of in or out format.
    • Add man input format.
  • MANUAL.txt:

    • Fix outdated description of latex_macros extension.
    • Clarified placement of bibliography.
    • Added “A note on security.”
    • Fix note on curly brace syntx for locators.
    • Document new explicit syntax for citeproc locators.
    • Remove confusing cross-links for some extensions.
    • Don’t put pandoc in code ticks in heading.
    • Document that --ascii works for gfm and commonmark too.
    • Add man to --from options.
  • doc/customizing-pandoc.md: various improvements (Mauro Bieg).

pandoc 2.4

04 Nov 05:13
@jgm jgm
Compare
Choose a tag to compare

pandoc (2.4)

[new features]

  • New input format man (Yan Pashkovsky, John MacFarlane).

[behavior changes]

  • --ascii is now implemented in the writers, not in Text.Pandoc.App, via the new writerPreferAscii field in WriterOptions. Now the write* functions for Docbook, HTML, ICML, JATS, LaTeX, Ms, Markdown, and OPML are sensitive to writerPreferAscii. Previously the to-ascii translation was done in Text.Pandoc.App, and thus not available to those using the writer functions directly.

  • --ascii now works with Markdown output. HTML5 character reference entities are used.

  • --ascii now works with LaTeX output. 100% ASCII output can’t be guaranteed, but the writer will use commands like \"{a} and \l whenever possible, to avoid emiting a non-ASCII character.

  • For HTML5 output, --ascii now uses HTML5 character reference entities rather than numerical entities.

  • Improved detection of format based on extension (in Text.Pandoc.App). We now ensure that if someone tries to convert a file for a format that has a pandoc writer but not a reader, it won’t just default to markdown.

  • Add viz. to abbreviations file (#5007, Nick Fleisher).

  • AsciiDoc writer: always use single-line section headers, instead of the old underline style (#5038). Previously the single-line style would be used if --atx-headers was specified, but now it is always used.

  • RST writer: Use simple tables when possible (#4750).

  • CommonMark (and gfm) writer: Add plain text fallbacks. (#4528, quasicomputational). Previously, the writer would unconditionally emit HTML output for subscripts, superscripts, strikeouts (if the strikeout extension is disabled) and small caps, even with raw_html disabled. Now there are plain-text (and, where possible, fancy Unicode) fallbacks for all of these corresponding (mostly) to the Markdown fallbacks, and the HTML output is only used when raw_html is enabled.

  • Powerpoint writer: support raw openxml (Jesse Rosenthal, #4976). This allows raw openxml blocks and inlines to be used in the pptx writer. Caveats: (1) It’s up to the user to write well-formed openxml. The chances for corruption, especially with such a brittle format as pptx, is high. (2) Because of the tricky way that blocks map onto shapes, if you are using a raw block, it should be the only block on a slide (otherwise other text might end up overlapping it). (3) The pptx ooxml namespace abbreviations are different from the docx ooxml namespaces. Again, it’s up to the user to get it right. Unzipped document and ooxml specification should be consulted.

  • With --katex in HTML formats, do not use the autorenderer (#4946). We no longer surround formulas with \(..\) or \[..\]. Instead, we tell katex to convert the contents of span elements with class “math”. Since math has already been identified, this avoids wasted time parsing for LaTeX delimiters. Note, however, that this may yield unexpected results if you have span elements with class “math” that don’t contain LaTeX math. Also, use latest version of KaTeX by default (0.9.0).

  • The man writer now produces ASCII-only output, using groff escapes, for portability.

  • ODT writer:

    • Add title, author and date to metadata; any remaining metadata fields are added as meta:user-defined tags.
    • Implement table caption numbering (#4949, Nils Carlson). Captioned tables are numbered and labeled with format “Table 1: caption”, where “Table” is replaced by a translation, depending on the value of lang in metadata. Uncaptioned tables are not enumerated.
    • OpenDocument writer: Implement figure numbering in captions (#4944, Nils Carlson). Figure captions are now numbered 1, 2, 3, … The format in the caption is “Figure 1: caption” and so on (where “Figure” is replaced by a translation, depending on the value of lang in the metadata). Captioned figures are numbered consecutively and uncaptioned figures are not enumerated. This is necessary in order for LibreOffice to generate an Illustration Index (Table of Figures) for included figures.
  • RST reader: Pass through fields in unknown directives as div attributes (#4715). Support class and name attributes for all directives.

  • Org reader: Add partial support for #+EXCLUDE_TAGS option. (#4284, Brian Leung). Headers with the corresponding tags should not appear in the output.

  • Log warnings about missing title attributes now include a suggestion about how to fix the problem (#4909).

  • Lua filter changes (Albert Krewinkel):

    • Report traceback when an error occurs. A proper Lua traceback is added if either loading of a file or execution of a filter function fails. This should be of help to authors of Lua filters who need to debug their code.

    • Allow access to pandoc state (#5015). Lua filters and custom writers now have read-only access to most fields of pandoc’s internal state via the global variable PANDOC_STATE.

    • Push ListAttributes via constructor (Albert Krewinkel). This ensures that ListAttributes, as present in OrderedList elements, have additional accessors (viz. start, style, and delimiter).

    • Rename ReaderOptions fields, use snake_case. Snake case is used in most variable names, using camelCase for these fields was an oversight. A metatable is added to ensure that the old field names remain functional.

    • Iterate over AST element fields when using pairs. This makes it possible to iterate over all ield names of an AST element by using a generic for loop with pairs`:

      for field_name, field_content in pairs(element) do
      ... 
      end
      

      Raw table fields of AST elements should be considered an implementation detail and might change in the future. Accessing element properties should always happen through the fields listed in the Lua filter docs.

      Note that the iterator currently excludes the t/tag field.

    • Ensure that MetaList elements behave like Lists. Methods usable on Lists can also be used on MetaList objects.

    • Fix MetaList constructor (Albert Krewinkel). Passing a MetaList object to the constructor pandoc.MetaList now returns the passed list as a MetaList. This is consistent with the constructor behavior when passed an (untagged) list.

  • Custom writers: Custom writers have access to the global variable PANDOC_DOCUMENT(Albert Krewinkel, #4957). The variable contains a userdata wrapper around the full pandoc AST and exposes two fields, meta and blocks. The field content is only marshaled on-demand, performance of scripts not accessing the fields remains unaffected.

[API changes]

  • Text.Pandoc.Options: add writerPreferAscii to WriterOptions.

  • Text.Pandoc.Shared:

    • Export splitSentences. This was previously duplicated in the Man and Ms writers.
    • Add ToString typeclass (Alexander Krotov).
  • New exported module Text.Pandoc.Filter (Albert Krewinkel).

  • Text.Pandoc.Parsing

    • Generalize gridTableWith to any Char Stream (Alexander Krotov).
    • Generalize readWithM from [Char] to any Char Stream that is a ToString instance (Alexander Krotov).
  • New exposed module Text.Pandoc.Filter (Albert Krewinkel).

  • Text.Pandoc.XML: add toHtml5Entities.

  • New exported module Text.Pandoc.Readers.Man (Yan Pashkovsky, John MacFarlane).

  • Text.Pandoc.Writers.Shared

    • Add exported functions toSuperscript and toSubscript (quasicomputational, #4528).
    • Remove exported functions metaValueToInlines, metaValueToString. Add new exported functions lookupMetaBool, lookupMetaBlocks, lookupMetaInlines, lookupMetaString. Use these whenever possible for uniformity in writers (Mauro Bieg, #4907). (Note that removed function metaValueToInlines was in previous released versions.)
    • Add metaValueToString.
  • Text.Pandoc.Lua

    • Expose more useful internals (Albert Krewinkel):

      • runFilterFile to run a Lua filter from file;
      • data type Global and its constructors; and
      • setGlobals to add globals to a Lua environment.

      This module also contains Pushable and Peekable instances required to get pandoc’s data types to and from Lua. Low-level Lua operation remain hidden in Text.Pandoc.Lua.

    • Rename runPandocLua to runLua (Albert Krewinkel).

    • Remove runLuaFilter, merging this into Text.Pandoc.Filter.Lua’s apply (Albert Krewinkel).

[bug fixes and under-the-hood improvements]

  • Text.Pandoc.Parsing

    • Make uri accept any stream with Char tokens (Alexander Krotov).
    • Rewrite uri without withRaw (Alexander Krotov).
    • Generalize parseFromString and parseFromString' to any streams with Char token (Alexander Krotov)
    • Rewrite nonspaceChar using noneOf (Alexander Krotov)
  • Text.Pandoc.Shared: Reimplement mapLeft using Bifunctor.first (Alexander Krotov).

  • Text.Pandoc.Pretty: Simplify Text.Pandoc.Pretty.offset (Alexander Krotov).

  • Text.Pandoc.App

    • Work around HXT limitation for –syntax-definition with windows drive (#4836).
    • Always preserve tabs for man format. We need it for tables.
    • Split command line parsing code into a separate unexported module, Text.Pandoc.App.CommandLineOptions (Albert Krewinkel).
  • Text.Pandoc.Readers.Roff: new unexported module for tokenizing roff documents.

  • New unexported module Text.Pandoc.RoffChar, provided character escape tables for roff formats.

  • Text.Pandoc.Readers.HTML: Fix htmlTag and isInlineTag to accept processing instructions (#3123, regression since 2.0).

  • Text.Pandoc.Readers.JATS: Use foldl' instead of maximum to ac...

Read more

pandoc 2.3.1

29 Sep 03:03
@jgm jgm
Compare
Choose a tag to compare
  • RST reader:

    • Parse RST inlines containing newlines (#4912, Francesco Occhipinti). This eliminates a regression introduced after pandoc 2.1.1, which caused inline constructions containing newlines not to be recognized.
    • Fix bug with internal link targets (#4919). They were gobbling up indented content underneath.
  • Markdown reader: distinguish autolinks in the AST. With this change, autolinks are parsed as Links with the uri class. (The same is true for bare links, if the autolink_bare_uris extension is enabled.) Email autolinks are parsed as Links with the email class. This allows the distinction to be represented in the AST.

  • Org reader:

    • Force inline code blocks to honor export options (Brian Leung).
    • Parse empty argument array in inline src blocks (Brian Leung).
  • Muse reader (Alexander Krotov):

    • Added additional tests.
    • Do not allow code markup to be followed by digit.
    • Remove heading level limit.
    • Simplify <literal> tag parsers
    • Parse Text instead of String. Benchmark shows 7% improvement.
    • Get rid of HTML parser dependency.
    • Various code improvements.
  • ConTeXt writer: change \ to / in Windows image paths (#4918). We do this in the LaTeX writer, and it avoids problems. Note that / works as a LaTeX path separator on Windows.

  • LaTeX writer:

    • Add support for multiprenote and multipostnote arguments with --biblatex (Brian Leung, #4930). The multiprenotes occur before the first prefix of a multicite, and the multipostnotes follow the last suffix.
    • Fix a use of last that might take empty list. If you ran with --biblatex and have an empty document (metadata but no blocks), pandoc would previously raise an error because of the use of last on an empty list.
  • RTF writer: Fix build failure with ghc-8.6.1 caused by missing MonadFail instance (Jonas Scholl).

  • ODT Writer: Improve table header row style handling (Nils Carlson). This changes the way styles for cells in the header row and normal rows are handled in ODT tables. Previously a new (but identical) style was generated for every table, specifying the style of the cells within the table. After this change there are two style definitions for table cells, one for the cells in the header row, one for all other cells. This doesn’t change the actual styles, but makes post-processing changes to the table styles much simpler as it is no longer necessary to introduce new styles for header rows and there are now only two styles where there was previously one per table.

  • HTML writer:

    • Don’t add uri class to presumed autolinks. Formerly the uri class was added to autolinks by the HTML writer, but it had to guess what was an autolink and could not distinguish [http://example.com](http://example.com) from <http://example.com>. It also incorrectly recognized [pandoc](pandoc) as an autolink. Now the HTML writer simply passes through the uri attribute if it is present, but does not add anything.
    • Avoid adding extra section nestings for revealjs. Previously revealjs title slides at level (slidelevel - 1) were nested under an extra section element, even when the section contained no additional (vertical) content. That caused problems for some transition effects.
    • Omit unknown attributes in EPUB2 output. For example, epub:type attributes should not be passed through, or the epub produced will not validate.
  • JATS writer: remove ‘role’ attribute on ‘bold’ and ‘sc’ elements (#4937). The JATS spec does not allow these.

  • Textile writer: don’t represent uri class explicitly for autolinks (#4913).

  • Lua filters (Albert Krewinkel):

    • Cleanup filter execution code.
    • Better error on test failure.
  • HTML, Muse reader tests: reduce time taken by round-trip test.

  • Added cabal.project.

  • MANUAL: epub:type is only useful for epub3 (Maura Bieg).

  • Use hslua v1.0.0 (Albert Krewinkel).

  • Fix translations/ru to use modern Russian orthography (Ivan Trubach).

  • Build Windows binary using ghc 8.6.1 and cabal new-build. This fixes issues with segfaults in the 32-bit Windows binaries (#4283).