Skip to content

Typeset Markdown to PDF for publishing

License

Notifications You must be signed in to change notification settings

jado4810/md2pdf

Repository files navigation

md2pdf - Typeset Markdown to PDF for publishing

Node.js Docker GitHub top language License

What is this?

The PDF converter for Markdown, assumed for use in exporting documents.

Those style is built with consciousness of paper prints:

  • A3, A4, A5, letter or legal, portrait or landscape
  • Header with the document title (Auto extraction from H1 header available)
  • Footer with the page number (Can be omitted)
  • For Japanese, with universal-design fonts (Morisawa BIZ UD)

Some kinds of the style are customizable:

  • Color scheme (color, grayscale, monochrome)
  • Language scheme (Latin, Japanese, Korean, Chinese)
  • Paragraph stye (indented, gapped)

It accepts Markdown format like below:

  • Marked-recognizable formats
  • Mermaid
  • Code highlight

How to Use

You can get output PDF from stdout. So remind using output redirection to get a result file.

Input file can be omitted or be specified -, to read input from stdin.

Considering of preparing environments like font files, we recommend to run as a container.

Run as a container

Helper script md2pdf.sh is available.

$ ./md2pdf.sh [options] [input] > output

Options are available below:

  • -p «paper»
    • Specify the paper size
    • Available below:
      • a3 (ISO A3 portrait), a3r (ISO A3 landscape)
      • a4 (ISO A4 portrait, default), a4r (ISO A4 landscape)
      • a5 (ISO A5 portrait), a5r (ISO A5 landscape)
      • letter (US letter portrait), letterr (US letter landscape)
      • legal (US legal portrait), legalr (US legal landscape)
  • -t «title»
    • Specify the document title to be printed on the page header
    • If omitted, attempt to extract from H1 header
  • -n
    • Omit page numbers
  • -r «ratio»
    • Specify the img magnify ratio in percentage
    • Recommended smaller (< 100) value on screenshots from zoomed screen to get clear images in PDF
  • -l «lang»
    • Speciry the language scheme, to decide font priorities, line break rules and text indentation rules
    • Available below:
      • latin (default) - For most european languages
      • ja - For Japanese
      • ko - For Korean
      • cn - For Simplified Chinese
      • tw - For Traditional Chinese
  • -i
    • Omit paragraph indentation and make gaps between paragraphs
  • -c «color»
    • Specify the color scheme
    • Available below:
      • color (default)
      • grayscale
      • monochrome
  • -a
    • Show anchor ids and texts of headings
    • Useful for making internal links to the headings

This script calls docker run like below:

$ docker run --rm -i -v «dir»:/opt/app/mnt md2pdf node md2pdf.js -b /opt/app/mnt «options» «input»
  • Extract base directory from input file path (or $PWD) and mount volume
  • Pass base directory to be extracted directory (or $PWD), so any resources are refered relative from the base directory

Direct-run NodeJS

Required web browser compatible with "headless mode" and proper fonts to be installed, and you can run it on the local NodeJS environment.

$ node md2pdf.js [options] [input] > output

You can append -b option to specify the base path for resources.

Preparation

Run as a container

There is pretty severe criteria to run puppeteer, a headless browser driver, in the container. However we have confirmed to be available on Intel Linux and ARM macOS.

$ docker build -t md2pdf .

The container image will be built with Chromium, an open source web browser component, introduced from the official debian package.

Direct-run NodeJS

If you want to run NodeJS directly, install required libraries first.

$ npm install

Also needs a web browser installed, which is compatible with "headless mode", i.e. Google Chrome, Microsoft Edge.

Markdown format

Markdown documents are rendered by marked.

In addition, the following extensions are provided.

Anchor of heading

Headings are given anchor ids, which are generated by "slugifying" those header texts in the same way as GitHub.

Those ids are useful for making internal links. Consider specifying the -a option to make visible slugified ids to link.

Figure with caption

Images with titles are rendered as figure elements with figurecaption holding those titles.

For example, titled image below:

![fig](image.png "Fig1. Sample image")

will be rendered as:

Also consider specifying the -r option.

Mermaid

Code blocks with the language specifiers of "mermaid" are rendered by mermaidjs.

For example, code block below:

```mermaid
flowchart LR
  A[Start]-->B{Check}
  C[Okay]
  D[NG]
  B-->|Yes| C
  B-->|No| D
```

will be rendered as:

Code highlight

Language specifiers following code block openers are passed to highlight.js. And additional filenames separated by colon are also available.

```javascript:sample.js
function highlight(code, lang) {
  try {
    code = hljs.highlight(code, {language: lang}).value;
  } catch (e) {
    console.error('Error: ', e);
  }
  return code;
}
```

will be:

Caption for code block (and mermaid)

Language specifiers can be followed by captions enclosed in double quotations.

Note that spaces are required after code block openers and language specifiers to avoid confusing common markdown parsers.

Captions are available also on mermaid blocks.

```javascript:sample.js "List 1. highlight sample"
function highlight(code, lang) {
  try {
    code = hljs.highlight(code, {language: lang}).value;
  } catch (e) {
    console.error('Error: ', e);
  }
  return code;
}
```

will be:

Paging control

Language specifiers can be followed by paging control specifications enclosed in brackets.

Note that spaces are required after code block openers and language specifiers to avoid confusing common markdown parsers.

The following controls are available:

  • flow

    Allows this code block paging inside the block; pagings are avoided inside it as a default, so breaks page before long code block. But flowed blocks do not break pages before it.

  • newpage

    Makes sure break page just before this code block.

  • isolated

    In addition to newpage, also breaks page just after this code block.

For example, the list below will be rendered in separate page.

```javascript:long.js [isolated]
function long_proc(list) {
  var a = 1;
  var b = 2;

  return list.forEach(function(elem) {
    elem.someProcs(a);
    elem.someProcs(b);
    // other long procs...
    // :
    // :
    // :
  });
}
```

Copyright and License

Copyright (c)2023-2024 Shun-ichi TAHARA <[email protected]>

Provided under MIT license, with the exception of third-party/getoptions directory, which is appropriated from ko1nksm/getoptions of CC0 license.