Releases · jgm/pandoc

01 Jun 13:31

jgm

2.14.0.1

7225d4a

pandoc 2.14.0.1

Click to expand changelog

Commonmark reader: Fix regression in 2.14 with YAML metdata block parsing, which could cause the document body to be omitted after metadata (#7339).
HTML reader: fix column width regression in 2.14 (#7334). Column widths specified with a style attribute were off by a factor of 100.
Markdown reader: in rebasePaths, check for both Windows and Posix absolute paths. Previously Windows pandoc was treating /foo/bar.jpg as non-absolute.
Text.Pandoc.Logging: In rendering LoadedResource, use relative paths.
Docx writer: fix regression on captions (#7328). The “Table Caption” style was no longer getting applied. (It was overwritten by “Compact.”)
Use commonmark-extensions 0.2.1.2

Assets 10

29 May 05:17

jgm

2.14

c98fe1f

pandoc 2.14

Click to expand changelog

Change reader types, allowing better tracking of source positions [API change]. Previously, when multiple file arguments were provided, pandoc simply concatenated them and passed the contents to the readers, which took a Text argument. As a result, the readers had no way of knowing which file was the source of any particular bit of text. This meant that we couldn’t report accurate source positions on errors or include accurate source positions as attributes in the AST. More seriously, it meant that we couldn’t resolve resource paths relative to the files containing them (see e.g. #5501, #6632, #6384, #3752).
Add rebase_relative_paths extension (#3752). When enabled, this extension rewrites relative image and link paths by prepending the (relative) directory of the containing file. This behavior is useful when your input sources are split into multiple files, across several directories, with files referring to images stored in the same directory. The extension can be enabled for all markdown and commonmark-based formats.
Add Text.Pandoc.Sources (exported module), with a Sources type and a ToSources class. A Sources wraps a list of (SourcePos, Text) pairs [API change]. A parsec Stream instance is provided for Sources. The module also exports versions of parsec’s satisfy and other Char parsers that track source positions accurately from a Sources stream (or any instance of the new UpdateSourcePos class).
Text.Pandoc.Parsing
- Export the modified Char parsers defined in Text.Pandoc.Sources instead of the ones parsec provides. Modified parsers to use a Sources as stream [API change].
- Improve include file functions [API change]. Remove old insertIncludedFileF. Give insertIncludedFile a more general type, allowing it to be used where insertIncludedFileF was.
- Add parameter to the citeKey parser from Text.Pandoc.Parsing, which controls whether the @{..} syntax is allowed [API change].
Text.Pandoc.Error: Modified the constructor PandocParsecError to take a Sources rather than a Text as first argument, so parse error locations can be accurately reported.
Fix source position reporting for YAML bibliographies (#7273).
Issue error message when reader or writer format is malformed (#7231). Previously we exited with an error status but (due to a bug) no message.
Smarter smart quotes (#7216, #2103). Treat a leading " with no closing " as a left curly quote. This supports the practice, in fiction, of continuing paragraphs quoting the same speaker without an end quote. It also helps with quotes that break over lines in line blocks.
Markdown reader:
- Use MetaInlines not MetaBlocks for multimarkdown metadata fields. This gives better results in converting to e.g. pandoc markdown.
- Implement curly-brace syntax for Markdown citation keys (#6026). The change provides a way to use citation keys that contain special characters not usable with the standard citation key syntax. Example: @{foo_bar{x}'} for the key foo_bar{x}. It also allows separating citation keys from immediately following text, e.g. @{foo}A.
RST reader:
- Seek include files in the directory of the file containing the include directive, as RST requires (#6632).
- Use insertIncludedFile from Text.Pandoc.Parsing instead of reproducing much of its code.
Org reader: Resolve org includes relative to the directory containing the file containing the INCLUDE directive (#5501).
ODT reader: Treat tabs as spaces (#7185, niszet).
Docx reader:
- Add handling of vml image objects (#7257, mbrackeantidot).
- Support new table features (Emily Bourke, #6316): column spans, row spans, multiple header rows, table description (parsed as a simple caption), captions, column widths.
LaTeX reader:
- Improved siunitx support (#6658, #6620).
- Better support for \xspace (#7299).
- Improve parsing of \def macros. We previously set “verbatim mode” even for parsing the initial \def; this caused problems for \def nested inside another \def.
- Implement \newif.
ConTeXt writer: improve ordered lists (#5016, Denis Maier). Change ordered list from itemize to enumerate. Add new itemgroup for ordered lists. Remove manual insertion of width attributes. Use tabular figures in ordered list enumerators.
HTML reader:
- Don’t fail on unmatched closing “script” tag (Albert Krenkel, #7282).
- Keep h1 tags as normal headers (#2293, Albert Krewinkel). The tags <title> and <h1 class="title"> often contain the same information, so the latter was dropped from the document. However, as this can lead to loss of information, the heading is now always retained. Use --shift-heading-level-by=-1 to turn the <h1> into the document title, or a filter to restore the previous behavior.
- Handle relative lengths (e.g. 2*) in HTML column widths (#4063). See https://www.w3.org/TR/html4/types.html#h-6.6.
DocBook/JATS readers:
- Fix mathml regression caused by the switch in XML libraries (#7173).
- Fix “phrase” in DocBook: take classes from “role” not “class” (#7195).
DocBook reader: ensure that first and last names are separated (#6541).
Jira reader (Albert Krewinkel, #7218):
- Support “smart” links: [alias|https://example.com|smart-card] syntax.
- Allow spaces and most unicode characters in attachment links.
- No longer require a newline character after {noformat}.
- Only allow URI path segment characters in bare links.
- The file: schema is no longer allowed in bare links; these rarely make sense.
Plain writer: handle superscript unicode minus (#7276).
LaTeX writer:
- Better handling of line breaks in simple tables (#7272). Now we also handle the case where they’re embedded in other elements, e.g. spans.
- For beamer output, support exampleblock and alertblock (#7278). A block will be rendered as an exampleblock if the heading has class example and an alertblock if it has class alert.
- Separate successive quote chars with thin space (#6958, Albert Krewinkel). Successive quote characters are separated with a thin space to improve readability and to prevent unwanted ligatures. Detection of these quotes sometimes had failed if the second quote was nested in a span element.
- Separate successive quote chars with thin space (#6958, Albert Krewinkel).
EPUB Writer: Fix belongs-to-collection XML id choice (#7267, nuew). The epub writer previously used the same XML id for both the book identifier and the epub collection. This causes an error on epubcheck.
BibTeX/BibLaTeX writer: Handle annote field (#7266).
ZimWiki writer: allow links and emphasis in headers (#6605, Albert Krewinkel).
ConTeXt writer:
- Support blank lines in line blocks (#6564, Albert Krewinkel, thanks to @denismaier).
- Use span identifiers as reference anchors (#7246, Albert Krewinkel).
HTML writer:
- Keep attributes from code nested below pre tag (#7221, Albert Krewinkel). If a code block is defined with <pre><code class="language-x">…</code></pre>, where the <pre> element has no attributes, then the attributes from the <code> element are used instead. Any leading language- prefix is dropped in the code’s class attribute are dropped to improve syntax highlighting.
- Ensure headings only have valid attribs in HTML4 (#5944, Albert Krewinkel).
- Parse <header> as a Div (Albert Krewinkel).
Org writer:
- Inline latex envs need newlines (#7252, tecosaur). As specified in https://orgmode.org/manual/LaTeX-fragments.html, an inline
  LaTeX block must start on a new line.
- Use LaTeX style maths deliminators (#7196, tecosaur).
JATS writer (Albert Krewinkel):
- Use either styled-content or named-content for spans (#7211). If the element has a content-type attribute, or at least one class, then that value is used as content-type and the span is put inside a <named-content> element. Otherwise a <styled-content> element is used instead.
- Reduce unnecessary use of <p> elements for wrapping (#7227). The <p> element is used for wrapping in cases were the contents would otherwise not be allowed in a certain context. Unnecessary wrapping is avoided, especially around quotes (<disp-quote> elements).
- Convert spans to <named-content> elements (#7211). Spans with attributes are converted to <named-content> elements instead of being wrapped with <milestone-start/> and <milestone-end> elements. Milestone elements are not allowed in documents using the articleauthoring tag set, so this change ensures the creation of valid documents.
- Add footnote number as label in backmatter (#7210). Footnotes in the backmatter are given the footnote’s number as a label. The articleauthoring output is unaffected from this change, as footnotes are placed inline there.
- Escape disallows chars in identifiers. XML identifiers must start with an underscore or letter, and can contain only a limited set of punctuation characters. Any IDs not adhering to these rules are rewritten by writing the offending characters as Uxxxx, where xxxx is the character’s hex code.
Jira writer: use {color} when span has a color attribute (Albert Krewinkel, tarleb/jira-wiki-markup#10).
Docx writer:
- Autoset table width if no column has an explicit width (Albert Krewinkel).
- Extract Table handling into separate module (Albert Krewinkel).
- Support colspans and rowspans in tables (Albert Krewinkel, #6315).
- Support multirow table headers (Albert Krewinkel).
- Improve integration ...

Assets 10

21 Mar 04:54

jgm

2.13

1302131

pandoc 2.13

Click to expand changelog

Support yaml_metadata_block extension for commonmark, gfm (#6537). This support is a bit more limited than with pandoc’s markdown. The YAML block must be the first thing in the input, and the leaf notes are parsed in isolation from the rest of the document. So, for example, you can’t use reference links if the references are defined later in the document.
Fix fallback to default partials when custom templates are used. If the directory containing a template does not contain the partial, it should be sought in the default templates, but this was not working properly (#7164).
Handle nocite better with --biblatex and --natbib (#4585). Previously the nocite metadata field was ignored with these formats. Now it populates a nocite-ids template variable and causes a \nocite command to be issued.
Text.Pandoc.Citeproc: apply fixLinks correctly (#7130). This is code that incorporates a prefix like https://doi.org/ into a following link when appropriate.
Text.Pandoc.Shared:
- Remove backslashEscapes, escapeStringUsing [API change]. Replace these inefficient association list lookups with more efficient escaping functions in the writers that used them (for a 10-25% performance boost in org, haddock, rtf, texinfo writers).
- Remove ToString, ToText typeclasses [API change]. These were needed for the transition from String to Text, but they are no longer used and may clash with other things.
- Simplify compactDL.
Text.Pandoc.Parsing:
- Change type of readWithM so that it is no longer polymorphic [API change]. The ToText class has been removed, and now that we’ve completed the transition to Text we no longer need this to operate on Strings.
- Remove F type synonym [API change]. Muse and Org were defining their own F anyway.
Text.Pandoc.Readers.Metadata:
- Export yamlMetaBlock [API change].
- Make yamlBsToMeta, yamlBsToRefs polymorphic on the parser state [API change].
Markdown reader: Fix regression with tex_math_backslash (#7155).
MediaWiki reader: Allow block-level content in notes (ref) (#7145).
Jira reader (Albert Krewinkel):
- Fixed parsing of autolinks (i.e., of bare URLs in the text). Previously an autolink would take up the rest of a line, as spaces were allowed characters in these items.
- Emoji character sequences no longer cause parsing failures. This was due to missing backtracking when emoji parsing fails.
- Mark divs created from panels with class “panel”.
RST reader: fix logic for ending comments (#7134). Previously comments sometimes got extended too far.
DocBook writer: include Header attributes as XML attributes on section (Erik Rask). Attributes with key names that are not allowed as XML attributes are dropped, as are attributes with invalid values and xml:id (DocBook 5) and id (DocBook 4).
Docx writer:
- Make nsid in abstractNum deterministic. Previously we assigned a random number, but we don’t need random values, so now we just assign a value based on the list marker.
- Use integral values for w:tblW (#7141).
Jira writer (Albert Krewinkel):
- Block quotes are only rendered as bq. if they do not contain a linebreak.
- Jira writer: improve div/panel handling. Include div attributes in panels, always render divs with class panel as panels, and avoid nesting of panels.
HTML writer: Add warnings on duplicate attribute values. This prevents emitting invalid HTML. Ultimately it would be good to prevent this in the types themselves, but this is better for now.
Org writer: Prevent unintended creation of ordered list items (#7132, Albert Krewinkel). Adjust line wrapping if default wrapping would cause a line to be read as an ordered list item.
JATS templates: support ‘equal-contrib’ attrib for authors (Albert Krewinkel). Authors who contributed equally to a paper may be marked with equal-contrib.
reveal.js template: replace JS comment with HTML (#7154, Florian Kohrt).
Text.Pandoc.Logging: Add DuplicateAttribute constructor to LogMessage. [API change]
Use -j4 for linux release build. This speeds up the build dramatically on arm.
cabal.project: remove ghcoptions. Move flags to top level, so they can be set differently on the command line.
Require latest texmath, skylighting, citeproc, jira-wiki-markup. (The latest skylighting fixes a bad bug with Haskell syntax highlighting.) Narrow version bounds for texmath, skylighting, and citeproc, since the test output depend on them.
Use doclayout 0.3.0.2. This significantly reduces the time and memory needed to compile pandoc.
Use foldl' instead of foldl everywhere.
Update bounds for random (#7156, Alexey Kuleshevich).
Remove uses of some partial functions.
Don’t bake in a larger stack size for the executable.
Test improvements:
- Use getExecutablePath from base, avoiding the dependency on executable-path.
- Factor out setupEnvironment in Helpers, to avoid code duplication.
- Fix finding of data files by setting teh pandoc_datadir environment variable when we shell out to pandoc. This avoids the need to use --data-dir for the tests, which caused problems finding pandoc.lua when compiling without the embed_data_files flag (#7163).
Benchmark improvements:
- Build +RTS -A8m -RTS into default ghc-options for benchmark. This is necessary to get accurate benchmark results; otherwise we are largely measuring garbage collecting, some not related to the current benchmark.
- Allow specifying BASELINE file in ‘make bench’ for comparison (otherwise the latest benchmark is chosen by default).
- Force readFile in benchmarks early (Bodigrim).
CONTRIBUTING: suggest using a cabal.project.local file (#7153, Albert Krewinkel).
Add ghcid-test to Makefile. This loads the test suite in ghcid.

Assets 10

08 Mar 20:07

jgm

2.12

31ca011

pandoc 2.12

Click to expand changelog

--resource-path now accumulates if specified multiple times (#6152). Resource paths specified later on the command line are prepended to those specified earlier. Thus, --resource-path foo --resource-path bar:baz is equivalent to --resource-path bar:bas:foo. (The previous behavior was for the last --resource-path to replace all the rest.) resource-path in defaults files behaves the same way: it will be prepended to the resource path set by earlier command line options or defaults files. This change facilitates the use of multiple defaults files: each can specify a directory containing resources it refers to without clobbering the resource paths set by the others.
Allow defaults files to refer to the home directory, the user data directory, and the directory containing the defaults file itself (#5871, #5982, #5977). In fields that expect file paths (and only in these fields),
- ${VARIABLE} will expand to the value of the environment variable VARIABLE (and in particular ${HOME} will expand to the path of the home directory). A warning will be raised for undefined variables.
- ${USERDATA} will expand to the path of the user data directory in force when the defaults file is being processed.
- ${.} will expand to the directory containing the defaults file. (This allows default files to be placed in a directory containing resources they make use of.)
When downloading content from URL arguments, be sensitive to the character encoding (#5600). We can properly handle UTF-8 and latin1 (ISO-8859-1); for others we raise an error. Fall back to latin1 if no charset is given in the mime type and UTF-8 decoding fails.
Allow abbreviations that don’t end in a period to be specified using --abbreviations (#7124).
Add new unexported module Text.Pandoc.XML.Light, as well as Text.Pandoc.XML.Light.Types, Text.Pantoc.XML.Light.Proc, Text.Pandoc.XML.Light.Output. (Closes #6001, #6565, #7091).

This module exports definitions of Element and Content that are isomorphic to xml-light’s, but with Text instead of String. This allows us to keep most of the code in existing readers that use xml-light, but avoid lots of unnecessary allocation.

We also add versions of the functions from xml-light’s Text.XML.Light.Output and Text.XML.Light.Proc that operate on our modified XML types, and functions that convert xml-light types to our types (since some of our dependencies, like texmath, use xml-light).

We export functions that use xml-conduit’s parser to produce an Element or [Content]. This allows existing pandoc code to use a better parser without much modification.

The new parser is used in all places where xml-light’s parser was previously used. Benchmarks show a significant performance improvement in parsing XML-based formats (with docbook, opml, jats, and docx almost twice as fast, odt and fb2 more than twice as fast).

In addition, the new parser gives us better error reporting than xml-light. We report XML errors, when possible, using the new PandocXMLError constructor in PandocError.

These changes revealed the need for some changes in the tests. The docbook-reader.docbook test lacked definitions for the entities it used; these have been added. And the docx golden tests have been updated, because the new parser does not preserve the order of attributes.
DocBook reader:
- Avoid expensive tree normalization step, as it is not necessary with the new XML parser.
- Support informalfigure (#7079) (Nils Carlson).
Docx reader:
- Use Map instead of list for Namespaces. This gives a speedup of about 5-10%. With this and the XML parsing changes, the docx reader is now about twice as fast as in the previous release.
HTML reader:
- Small performance tweaks.
- Also, remove exported class NamedTag(..) [API change]. This was just intended to smooth over the transition from String to Text and is no longer needed.
- As a result, the functions isInlineTag and isBlockTag are no longer polymorphic; they apply to a Tag Text [API change].
- Do a lookahead to find the right parser to use. This takes benchmarks from 34ms to 23ms, with less allocation.
- Fix bad handling of empty src attribute in iframe (#7099). If src is empty, we simply skip the iframe. If src is invalid or cannot be fetched, we issue a warning nd skip instead of failing with an error.
JATS reader:
- Avoid tree normalization, which is no longer necessary given the new XML parser.
LaTeX reader:
- Don’t export tokenize, untokenize [API change]. These are internal implementation details, which were only exported for testing. They don’t belong in the public API.
- Improved efficiency of the parser. With these changes the reader is almost twice as fast as in the last release in our benchmarks.
- Code cleanup, removing some unnecessary things.
- Rewrite withRaw so it doesn’t rely on fragile assumptions about token positions (which break when macros are expanded) (#7092). This requires the addition of sEnableWithRaw and sRawTokens in LaTeXState, and a new combinator disablingWithRaw to disable collecting of raw tokens in certain contexts. Add parseFromToks to Text.Pandoc.Readers.LaTeX.Parsing. Fix parsing of single character tokens so it doesn’t mess up the new raw token collecting. These changes slightly increase allocations and have a small performance impact.
- Handle some bibtex/biblatex-specific commands that used to be dealt with in pandoc-citeproc (#7049).
- Optimize satisfyTok, avoiding unnecessary macro expansion steps. Benchmarks after this change show 2/3 of the run time and 2/3 of the allocation of the Feb. 10 benchmarks.
- Removed sExpanded in state. This isn’t actually needed and checking it doesn’t change anything.
- Improve braced'. Remove the parameter, have it parse the opening brace, and make it more efficient.
- Factor out pieces of the LaTeX reader to make the module smaller. This reduces memory demands when compiling. Created Text.Pandoc.Readers.{LaTeX,Math,Citation,Table,Macro,Inline}. Changed Text.Pandoc.Readers.LaTeX.SIunitx to export a command map instead of individual commands.
- Handle table cells containing & in \verb (#7129).
Make Text.Pandoc.Readers.LaTeX.Types an unexported module [API change].
Markdown reader:
- Improved handling of mmd link attributes in references (#7080). Previously they only worked for links that had titles.
- Improved efficiency of the parser (benchmarks show a 15% speedup).
OPML reader:
- Avoid tree normalization, which is no longer necessary with the new XML parser.
ODT reader:
- Finer-grained errors on parse failure (#7091).
- Give more information if the zip container can’t be unpacked.
Org reader:
- Support task_lists extension (Albert Krewinkel, #6336).
- Fix bug in org-ref citation parsing (Albert Krewinkel, #7101). The org-ref syntax allows to list multiple citations separated by comma. Previously commas were accepted as part of the citation id, so all citation lists were parsed as one single citation.
RST reader:
- Use getTimestamp instead of getCurrentTime to fetch timestamp. Setting SOURCE_DATE_EPOCH will allow reproducible builds.
- RST reader: fix handling of header in CSV tables (#7064). The interpretation of this line is not affected by the delim option.
Jira reader:
- Modified the Doc parser to skip leading blank lines. This fixes parsing of documents which start with multiple blank lines (Albert Krewinkel, #7095).
- Prevent URLs within link aliases to be treated as autolinks (Albert Krewinkel, #6944).
Text.Pandoc.Shared
- Remove formerly exported functions that are no longer used in the code base: splitByIndices, splitStringByIndicies, substitute, and underlineSpan (which had been deprecated in April 2020) [API change].
- Export handleTaskListItem (Albert Krewinkel) [API change].
- Change defaultUserDataDirs to defaultUserDataDir [API change]. We determine what is the default user data directory by seeing whether the XDG directory and/or legacy directory exist.
BibTeX writer:
- BibTeX writer: use doclayout and doctemplate. This change allows bibtex/biblatex output to wrap as other formats do, depending on the settings of --wrap and --columns (#7068).
CSL JSON writer:
- Output [] if no references in input, instead of raising a PandocAppError as before.
Docx writer:
- Use getTimestamp instead of getCurrentTime for timestamp. Setting SOURCE_DATE_EPOCH will allow reproducible builds.
EPUB writer:
- Use getTimestamp instead of getCurrentTime for timestamp. Setting SOURCE_DATE_EPOCH will allow reproducible builds (#7093). This does not suffice to fully enable reproducible in EPUB, since a unique id is still being generated for each build.
- Support belongs-to-collection metadata (#7063) (Nick Berendsen).
JATS writer:
- Escape special chars in reference elements (Albert Krewinkel). Prevents the generation of invalid markup if a citation element contains an ampersand or another character with a special meaning in XML.
Jira writer:
- Use Span identifiers as anchors (Albert Krewinkel).
- Use {noformat} instead of {code} for unknown languages (Albert Krewinkel). Code blocks which are not marked as a language supported by Jira are rendered as preformatted text via {noformat} blocks.
LaTeX writer:
- Adjust hypertargets to beginnings of paragraphs (#7078). Use \vadjust pre so that the h...

Assets 10

23 Jan 23:18

jgm

2.11.4

54d8c69

pandoc 2.11.4

Click to expand changelog

Add biblatex, bibtex as output formats (closes #7040).
Recognize more extensions as markdown by default (#7034): mkdn, mkd, mdwn, mdown, Rmd.
Implement defaults file inheritance (#6924, David Martschenko). Allow defaults files to inherit options from other defaults files by specifying them with the following syntax: defaults: [list of defaults files or single defaults file].
Fix infinite HTTP requests when writing epubs from URL source (#7013). Due to a bug in code added to avoid overwriting the cover image if it had the form fileX.YYY, pandoc made an endless sequence of HTTP requests when writing epub with input from a URL.
Org reader:
- Allow multiple pipe chars in todo sequences (Albert Krewinkel, #7014). Additional pipe chars, used to separate “action” state from “no further action” states, are ignored. E.g., for the following sequence, both DONE and FINISHED are states with no further action required: #+TODO: UNFINISHED | DONE | FINISHED.
- Restructure output of captioned code blocks (Albert Krewinkel, #6977). The Div wrapper of code blocks with captions now has the class “captioned-content”. The caption itself is added as a Plain block inside a Div of class “caption”. This makes it easier to write filters which match on captioned code blocks. Existing filters will need to be updated.
- Mark verbatim code with class verbatim (Dimitri Sabadie, #6998).
LaTeX reader:
- Handle filecontents environment (#7003).
- Put contents of unknown environments in a Div when raw_tex is not enabled (#6997). (When raw_tex is enabled, the whole environment is parsed as a raw block.) The class name is the name of the environment. Previously, we just included the contents without the surrounding Div, but having a record of the environment’s boundaries and name can be useful.
Mediawiki reader:
- Allow space around storng/emph delimiters (#6993).
New module Text.Pandoc.Writers.BibTeX, exporting writeBibTeX and writeBibLaTeX. [API change]
LaTeX writer:
- Revert table line height increase in 2.11.3 (#6996). In 2.11.3 we started adding \addlinespace, which produced less dense tables. This wasn’t an intentional change; I misunderstood a comment in the discussion leading up to the change. This commit restores the earlier default table appearance. Note that if you want a less dense table, you can use something like \def\arraystretch{1.5} in your header.
EPUB writer:
- Adjust internal links to identifiers defined in raw HTML sections after splitting into chapters (#7000).
- Recognize Format "html4", Format "html5" as raw HTML.
- Adjust internal links to images, links, and tables after splitting into chapters. Previously we only did this for Div and Span and Header elements (see #7000).
Ms writer:
- Don’t justify text inside table cells.
JATS writer:
- Use <element-citation> if element_citations extension is enabled (Albert Krewinkel).
- Fix citations (Albert Krewinkel, #7018). By default we use formatted citations.
- Ensure that <disp-quote> is always wrapped in <p> (#7041).
Markdown writer:
- Cleaned up raw formats. We now react appropriately to gfm, commonmark, and commonmark_x as raw formats.
RST writer:
- Fix bug with dropped content from inside spans with a class in some cases (#7039).
Docx writer:
- Handle table header using styles (#7008). Instead of hard-coding the border and header cell vertical alignment, we now let this be determined by the Table style, making use of Word’s “conditional formatting” for the table’s first row. For headerless tables, we use the tblLook element to tell Word not to apply conditional first-row formatting.
Commonmark writer:
- Implement start number on ordered lists (#7009). Previously they always started at 1, but according to the spec the start number is respected.
HTML writer:
- Fix implicit_figure at end of footnotes (#7006).
ConTeXt template: Remove \setupthinrules from default template. The width parameter this used is not actually supported, and the command didn’t do anything.
Text.Pandoc.Extensions:
- Add Ext_element_citations constructor (Albert Krewinkel).
Text.Pandoc.Citeproc.BibTeX: New unexported function writeBibtexString.
Text.Pandoc.Citeproc:
- Use finer grained imports (Albert Krewinkel).
- Factor out and export getStyle [API change].
- Export getReferences [API change, #7106].
- Factor out getLang.
Text.Pandoc.Parsing: modify gridTableWith' for headerless tables. If the table lacks a header, the header row should be an empty list. Previously we got a list of empty cells, which caused an empty header to be emitted instead of no header. In LaTeX/PDF output that meant we got a double top line with space between.
ImageSize: use viewBox for SVG if no length, width attributes (#7045). This change allows pandoc to extract size information from more SVGs.
Add simple default.nix.
Use commonmark 0.1.1.3.
Use citeproc 0.3.0.5.
Update default CSL to use latest chicago-author-date.csl.
CONTRIBUTING.md: add note on GNU xargs.
MANUAL.txt:
- Update description of -L/--lua-filter.
- Document use of citations in note styles (#6828).

Assets 8

30 Dec 00:43

jgm

2.11.3.2

dcbd8d3

pandoc 2.11.3.2

Click to expand changelog

HTML reader: use renderTags’ from Text.Pandoc.Shared (Albert Krewinkel). A side effect of this change is that empty <col> elements are written as self-closing tags in raw HTML blocks.
Asciidoc writer: Add support for writing nested tables (#6972, timo-a). Asciidoc supports one level of nesting. If deeper tables are to be written, they are omitted and a warning is issued.
Docx writer: fix nested tables with captions (#6983). Previously we got unreadable content, because docx seems to want a <w:p> element (even an empty one) at the end of every table cell.
Powerpoint writer: allow arbitrary OOXML in raw inline elements (Albert Krewinkel). The raw text is now included verbatim in the output. Previously is was parsed into XML elements, which prevented the inclusion of partial XML snippets.
LaTeX writer: support colspans and rowspans in tables (#6950, Albert Krewinkel). Note that the multirow package is needed for rowspans. It is included in the latex template under a variable, so that it won’t be used unless needed for a table.
HTML writer: don’t include p tags in CSL bibliography entries (#6966). Fixes a regression in 2.11.3.
Add meta-description variable to HTML templates (#6982). This is populated by the writer by stringifying the description field of metadata (Jerry Sky). The description meta tag will make the generated HTML documents more complete and SEO-friendly.
Citeproc: fix handling of empty URL variables (DOI, etc.). The linkifyVariables function was changing these to links which then got treated as non-empty by citeproc, leading to wrong results (e.g. ignoring nonempty URL when empty DOI is present). See jgm/citeproc#41.
Use citeproc 0.3.0.3. Fixes an issue in author-only citations when both an author and translator are present, and an issue with citation group delimiters.
Require texmath 0.12.1. This improves siunitx support in math, fixes bugs with \*mod family operators and arrays, and avoids italicizing symbols and operator names in docx output.
Ensure that the perl interpreter used for filters with .pl extension (wuffi).
MANUAL: note that textarea content is never parsed as Markdown (Albert Krewinkel).

Assets 8

19 Dec 01:48

jgm

2.11.3.1

37ba5d5

pandoc 2.11.3.1

Click to expand changelog

Added some missing files to extra-source-files and data files, so they are included in the sdist tarball. Closes #6961. Cleaned up some extraneous data and test files, and added a CI check to ensure that the test and data files included in the sdist match what is in the git repository.
Use citeproc 0.3.0.1, which avoids removing nonbreaking space at the end of the initialize-with attribute. (Some journals require nonbreaking space after initials, and this makes that possible.)

Assets 8

18 Dec 08:01

jgm

2.11.3

ec0ec4a

pandoc 2.11.3

Click to expand changelog

With --bibliography (or bibliography in metadata), a URL may now be provided, and pandoc will fetch the resource. In addition, if a file path is provided and it is not found relative to the working directory, the resource path will be searched (#6940).
Add sourcepos extension for commonmark, gfm, commonmark_x (#4565). With the sourcepos extension set set, data-pos attributes are added to the AST by the commonmark reader. No other readers are affected. The data-pos attributes are put on elements that accept attributes; for other elements, an enlosing Div or Span is added to hold the attributes.
Change extensions for commonmark_x: replace auto_identifiers with gfm_auto_identifiers (#6863). commonmark_x never actually supported auto_identifiers (it didn’t do anything), because the underlying library implements gfm-style identifiers only. Attempts to add the auto_identifiers extension to commonmark will now fail with an error.
HTML reader:
- Split module into several submodules (Albert Krewinkel). Reducing module size should reduce memory use during compilation.
- Support advanced table features (Albert Krewinkel): block level content in captions, row and colspans, body headers, row head columns, footers, attributes.
- Disable round-trip testing for tables. Information for cell alignment in a column is not preserved during round-trips (Albert Krewinkel).
- Allow finer grained options for tag omission (Albert Krewinkel).
- Simplify list attribute handling (Albert Krewinkel).
- Pay attention to lang attributes on body element (#6938). These (as well as lang attributes on the html element) should update lang in metadata.
- Retain attribute prefixes and avoid duplicates (#6938). Previously we stripped attribute prefixes, reading xml:lang as lang for example. This resulted in two duplicate lang attributes when xml:lang and lang were both used. This commit causes the prefixes to be retained, and also avoids invald duplicate attributes.
Commonmark reader:
- Refactor specFor.
- Set input name to "" to avoid clutter in sourcepos output.
Org reader:
- Parse #+LANGUAGE into lang metadata field (#6845, Albert Krewinkel).
- Preserve targets of spurious links (#6916, Albert Krewinkel). Links with (internal) targets that the reader doesn’t know about are converted into emphasized text. Information on the link target is now preserved by wrapping the text in a Span of class spurious-link, with an attribute target set to the link’s original target. This allows to recover and fix broken or unknown links with filters.
DocBook reader:
- Table text width support (#6791, Nils Carlson). Table width in relation to text width is not natively supported by docbook but is by the docbook fo stylesheets through an XML processing instruction, <?dbfo table-width="50%"?>.
LaTeX reader:
- Improve parsing of command options (#6869, #6873). In cases where we run into trouble parsing inlines til the closing ], e.g. quotes, we return a plain string with the option contents. Previously we mistakenly included the brackets in this string.
- Preserve center environment (#6852, Igor Pashev). The contents of the center environment are put in a Div with class center.
- Don’t parse \rule with width 0 as horizontal rule. These are sometimes used as spacers in LaTeX.
- Don’t apply theorem default styling to a figure inside (#6925). If we put an image in italics, then when rendering to Markdown we no longer get an implicit figure.
Dokuwiki reader:
- Handle unknown interwiki links better (#6932). DokuWiki lets the user define his own Interwiki links. Previously pandoc reacted to these by emitting a google search link, which is not helpful. Instead, we now just emit the full URL including the wikilink prefix, e.g. faquk>FAQ-mathml. This at least gives users the ability to modify the links using filters.
Markdown writer:
- Properly handle boolean values in writing YAML metadata (#6388).
- Ensure that a new csl-block begins on a new line (#6921). This just looks better and doesn’t affect the semantics.
RST writer:
- Better image handling (#6948). An image alone in its paragraph (but not a figure) is now rendered as an independent image, with an alt attribute if a description is supplied. An inline image that is not alone in its paragraph will be rendered, as before, using a substitution. Such an image cannot have a “center”, “left”, or “right” alignment, so the classes align-center, align-left, or align-right are ignored. However, align-top, align-middle, align-bottom will generate a corresponding align attribute.
Docx writer:
- Keep raw openxml strings verbatim (#6933, Albert Krewinkel).
- Use Content instead of Element. This allows us to inject raw OpenXML into the document without reparsing it into an Element, which is necessary if you want to inject an open tag or close tag.
- Fix bullets/lists indentation, so that the first level is slightly indented to the right instead of right on the margin (cholonam).
- Support bold and italic in “complex script” (#6911). Previously bold and italics didn’t work properly in LTR text. This commit causes the w:bCs and w:iCs attributes to be used, in addition to w:b and w:i, for bold and italics respectively.
ICML writer:
- Fix image bounding box for custom widths/heighta (Mauro Bieg, #6936).
LaTeX writer:
- Improve table spacing (#6842, #6860). Remove the \strut that was added at the end of minipage environments in cells. Replace \tabularnewline with \\ \addlinespace.
- Improve calculation of column spacing (#6883).
- Extract table handling into separate module (Albert Krewinkel).
- Fix bug with nested csl- display Spans (#6921).
- Improve longtable output (#6883). Don’t create minipages for regular paragraphs. Put width and alignment information in the longtable column descriptors.
OpenDocument writer:
- Support for table width as a percentage of text width (#6792, Nils Carson).
- Implement Div and Span ident support (#6755, Nils Carson). Spans and Divs containing an ident in the Attr will become bookmarks or sections with idents in OpenDocument format.
- Add two extensions, xrefs_name and xrefs_number (#6774, Nils Carlson). Links to headings, figures and tables inside the document are substituted with cross-references that will use the name or caption of the referenced item for xrefs_name or the number for xrefs_number. For the xrefs_number to be useful heading numbers must be enabled in the generated document and table and figure captions must be enabled using for example the native_numbering extension. In order for numbers and reference text to be updated the generated document must be refreshed.
JATS writer:
- Support advanced table features (Albert Krewinkel).
- Support author affiliations (#6687, Albert Krewinkel).
Docbook writer:
- Use correct id attribute consistently (Jan Tojnar). DocBook5 should always use xml:id instead of id.
- Handle admonition titles better (Jan Tojnar). Docbook reader produces a Div with title class for <title> element within an “admonition” element. Markdown writer then turns this into a fenced div with title class attribute. Since fenced divs are block elements, their content is recognized as a paragraph by the Markdown reader. This is an issue for Docbook writer because it would produce an invalid DocBook document from such AST – the <title> element can only contain “inline” elements. Handle this special case separately by unwrapping the paragraph before creating the <title> element.
- Add XML namespaces to top-level elements (#6923, Jan Tojnar). Previously, we only added xmlns attributes to chapter elements, even when running with --top-level-division=section. These namespaces are now added to part and section elements too, when they are the selected top-level divisions. We do not need to add namespaces to documents produced with --standalone flag, since those will already have xmlns attribute on the root element in the template.
HTML writer:
- Fix handling of nested csl- display spans (#6921). Previously inner Spans used to represent CSL display attributes were not rendered as div tags as intended.
EPUB writer:
- Include title page in landmarks (#6919). Note that the toc is also included if --toc is specified.
- Add frontmatter type on body element for nav.xhtml (#6918).
EPUB templates: use preserveAspectRatio=“xMidYMid” for cover image (#6895, Shin Sang-jae). This change affects both the epub2 and the epub3 templates. It avoids distortion of the cover image by requiring that the aspect ratio be preserved.
LaTeX template:
- Include csquotes package if csquotes variable set.
- Put back amssymb. We need it for checkboxes in todo lists, and maybe for other things. In this location it seems compatible with the cases that prompted #6469 and PR #6762.
- Disable language-specific shorthands in babel (#6817, #6887). Babel defines “shorthands” for some languages, and these can produce unexpected results. For example, in Spanish, 1.22 gets rendered as 122, and et~al. as etal. One would think that babel’s shorthands=off option (which we were using) would disable these, but it doesn’t. So we remove shorthands=off and add some code that redefines the shorthands macro. Eventually this will be fixed in babel, I hope, and we can revert to something simpler.
JATS template: allow array of persistent institute ...

Assets 8

19 Nov 23:01

jgm

2.11.2

0c8ab8a

pandoc 2.11.2

Click to expand changelog

Default to using ATX (##-style) headings for Markdown output (#6662, Aner Lucero). Previously we used Setext (underlined) headings by default for levels 1–2.
Add option --markdown-headings=atx|setext, and deprecate --atx-headers (#6662, Aner Lucero).
Support markdown-headings in defaults files.
Fix corner case in YAML metadata parsing (#6823). Previously YAML metadata would sometimes not get recognized if a field ended with a newline followed by spaces.
--self-contained: increase coverage (#6854). Previously we only self-contained attributes for certain tag names (img, embed, video, input, audio, source, track, section). Now we self-contain any occurrence of src, data-src, poster, or data-background-image, on any tag; and also href on link tags.
Markdown reader:
- Fix detection of locators following in-text citations. Prevously, if we had @foo [p. 33; @bar], the p. 33 would be incorrectly parsed as a prefix of @bar rather than a suffix of @foo.
- Improve period suppression algorithm for citations in notes in note citation styles (#6835).
- Don’t increment stateNoteNumber for example list references. This helps with #6836 (a bug in which example list references disturb calculation of citation note number and affect when ibid is triggered).
LaTeX reader:
- Move getNextNumber from Readers.LaTeX to Readers.LaTeX.Parsing.
- Fix negative numbers in siunitx commands. A change in pandoc 2.11 broke negative numbers, e.g. \SI{-33}{\celcius} or \num{-3}. This fixes the regression.
DocBook reader: drop period in formalpara title and put it in a div with class formalpara-title, so that people can reformat with filters (#6562).
Man reader: improve handling of .IP (#6858). We now better handle .IP when it is used with non-bullet, non-numbered lists, creating a definition list. We also skip blank lines like groff itself.
Bibtex reader: fall back on en-US if locale for LANG not found. This reproduces earlier pandoc-citeproc behavior (jgm/citeproc#26).
JATS writer:
- Wrap all tables (Albert Krewinkel). All <table> elements are put inside <table-wrap> elements, as the former are not valid as immediate child elements of <body>.
- Move Table handling to separate module (Albert Krewinkel). Adds two new unexported modules: Text.Pandoc.Writers.JATS.Types, Text.Pandoc.Writers.JATS.Table.
Org writer:
- Replace org #+KEYWORDS with #+keywords (TEC). As of ~2 years ago, lower case keywords became the standard (though they are handled case insensitive, as always).
- Update org supported languages and identifiers according to the current list contained in https://orgmode.org/worg/org-contrib/babel/languages/index.html (TEC).
Only use filterIpynbOutput if input format is ipynb (#6841). Before this change content could go missing from divs with class output, even when non-ipynb was being converted.
When checking reader/writer name, check base name now that we permit extensions on formats other than markdown.
Text.Pandoc.PDF: Fix changePathSeparators for Windows (#6173). Previously a path beginning with a drive, like C:\foo\bar, was translated to C:\/foo/bar, which caused problems. With this fix, the backslashes are removed.
Text.Pandoc.Logging: Add constructor ATXHeadingInLHS constructor to LogMessage [API change].
Fix error that is given when people specify doc output (#6834, gison93).
LaTeX template: add a \break after parbox in CSLRightInline. This should fix spacing problems between entries with numeric styles. Also fix number of params on CSLReferences.
reveal.js template: Put quotes around controlsLayout, controlsBackArrows, and display, since these require strings. Add showSlideNumber, hashOneBasedIndex, pause.
Use citeproc 0.2. This fixes a bug with title case around parentheses.
pandoc.cabal: remove ‘static’ flag. This isn’t really necessary and can be misleading (e.g. on macOS, where a fully static build isn’t possible). cabal’s new option --enable-executable-static does the same. On stack you can add something like this to the options for your executable in package.yaml:
```
ld-options: -static -pthread
```
Remove obsolete bibutils flag setting in linux/make_artifacts.sh.
Manual:
- Correct link-citation -> link-citations.
- Add a sentence about pagetitle for HTML (#6843, Alex Toldaiev).
INSTALL.md: Remove references to pandoc-citeproc (#6857).
CONTRIBUTING: describe hlint and how it’s used (#6840, Albert Krewinkel).

Assets 8

08 Nov 04:44

jgm

2.11.1.1

cfb017c

pandoc 2.11.1.1

Click to expand changelog

Citeproc: improve punctuation in in-text note citations (#6813). Previously in-text note citations inside a footnote would sometimes have the final period stripped, even if it was needed (e.g. on the end of ‘ibid’).
Use citeproc 0.1.1.1. This improves the decision about when to use ibid in cases where citations are used inside a footnote (#6813).
Support nocase spans for csljson output.
Require latest commonmark, commonmark-extensions. This fixes a bug with autolink_bare_uris and commonmark.
LaTeX reader: better handling of \\ inside math in table cells (#6811).
DokuWiki writer: translate language names for code elements and improve whitespace (#6807).
MediaWiki writer: use syntaxhighlight tag instead of deprecated source for highlighted code (#6810). Also support startFrom attribute and numberLines.
Lint code in PRs and when committing to master (#6790, Albert Krewinkel).
doc/filters.md: describe technical details of filter invocations (#6815, Albert Krewinkel).

Assets 8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: jgm/pandoc

pandoc 2.14.0.1

pandoc 2.14

pandoc 2.13

pandoc 2.12

pandoc 2.11.4

pandoc 2.11.3.2

pandoc 2.11.3.1

pandoc 2.11.3

pandoc 2.11.2

pandoc 2.11.1.1