Pandoc

A universal document converter and its ecosystem

Yasin Raies

09.01.1984

Suppose you want to connect your old gameconsole to your TV …

What is Pandoc

Pandoc is a free and open-source software document converter created by John MacFarlane.

Supported Formats:

commonmark, creole, docbook, docx, epub, fb2, gfm (GitHub-Flavored Markdown), haddock, html, jats, json, latex, markdown (Pandoc’s Markdown), markdown_mmd, markdown_phpextra, markdown_strict, mediawiki, man, muse, native, odt, opml, org, rst, t2t, textile, tikiwiki, twiki, vimwiki

Out

asciidoc, beamer, commonmark, context, docbook or docbook4, docbook5, docx, dokuwiki, epub or epub3, epub2, fb2, gfm (GitHub-Flavored Markdown), haddock, html or html5 (HTML, i.e. HTML5/XHTML polyglot markup), html4, icml, jats, json, latex, man, markdown (Pandoc’s Markdown), markdown_mmd, markdown_phpextra, markdown_strict, mediawiki, ms, muse, native, odt, opml, opendocument, org, plain, pptx, rst, rtf, texinfo, textile, slideous, slidy, dzslides, revealjs, s5, tei, zimwiki

Pandoc-flavoured Markdown

A Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions. - John Gruber

PFM (pandoc favoured Markdown) expands on this by having multiple outputformats in mind.

Metadata

---
title: Pandoc
subtitle: A universal document converter and its ecosystem
author: Yasin Raies
theme: white
center: true
width: 1280
height: 720
date: 09.01.2019
---

Inline

In	Out
`text`	text
`emphasis`	emphasis
`strong`	strong
`~~strike~~`	~~strike~~
`S~ub~ S^uper^`	S_ub S^uper
`$$e^{\pi i} + 1 = 0$$`	e^πi + 1 = 0

In	Out
`"Quote"`	“Quote”
`verb/code`	`verb/code`
`[FMI](fmi-wuerzburg.de)`	FMI
`![Logo]([...].png)`
`word^[Some Note]`	word¹
`[This is a span]{.smallcaps}`	This is a span

Blocks

LineBlock

Ingredients:
    0.5 Lime
    5 tbsp Sugar
    350 ml Ginger Ale
    Crusehd Ice

| **Ingredients**:
|     0.5 Lime
|     5 *tbsp* Sugar
|     350 *ml* Ginger Ale
|     Crusehd Ice

Verbatim/Code

while(true){
    doTalk();
}

```java
while(true){
    doTalk();
}
```

Quotes

This is a blockquote

> This is a blockquote

Ordered/Bullet List

one
1. two
2. three
five

some
- things
- are
weird

I) one
   3. two 
   7. three
I) five

- some
  + things
  + are
- weird

Definition List

Term 1: Definition 1
Term 2: Definition 2a; Definition 2b

Term 1
  ~ Definition 1

Term 2
  ~ Definition 2a
  ~ Definition 2b

Example List

This is a numbered example.
This example can be referenced.

See (2).

(@) This is a numbered example.
(@ex) This example can be referenced.

See (@ex).

Tables

Right	Left	Center	Default
12	12	12	12

Fruit	Price	Advantages
Bananas	$1.34	built-in wrapper bright color

Right	Left	Default	Center
12	12	12	12

  Right Left     Center   Default
------- ------ ---------- -------
     12 12        12          12

+---------------+---------------+--------------------+
| Fruit         | Price         | Advantages         |
+===============+===============+====================+
| Bananas       | $1.34         | - built-in wrapper |
|               |               | - bright color     |
+---------------+---------------+--------------------+

| Right | Left | Default | Center |
|------:|:-----|---------|:------:|
|   12  |  12  |    12   |    12  |

Headers, Rules and Divs

This is a div.

### Headers, Rules and Divs {#headers-and-stuff}

::: {style="color:red"}

This is a div.
:::

Working with Pandoc

The Pipeline

Reader ⇒ Filters ⇒ Writer ⇒ Template

Reader/Writer: Convert between a given format and an AST ²
Filter: Performs operations on and modifies the AST
Template: Supplys a surrounding in which the converted AST is embedded

The CLI

Pandoc has some parameters:

pandoc [OPTIONS] [FILES]
  -f FORMAT, -r FORMAT  --from=FORMAT, --read=FORMAT
  -t FORMAT, -w FORMAT  --to=FORMAT, --write=FORMAT
  -o FILE               --output=FILE
                        --data-dir=DIRECTORY
                        --base-header-level=NUMBER
                        --strip-empty-paragraphs
                        --indented-code-classes=STRING
  -F PROGRAM            --filter=PROGRAM
                        --lua-filter=SCRIPTPATH
  -Pandoc is a free and open-source software document converterp                    --preserve-tabs
                        --tab-stop=NUMBER
                        --track-changes=accept|reject|all
                        --file-scope
                        --extract-media=PATH
  -s                    --standalone
                        --template=FILE
  -M KEY[:VALUE]        --metadata=KEY[:VALUE]
  -V KEY[:VALUE]        --variable=KEY[:VALUE]
  -D FORMAT             --print-default-template=FORMAT
                        --print-default-data-file=FILE
                        --print-highlight-style=STYLE|FILE
                        --dpi=NUMBER
                        --eol=crlf|lf|native
                        --wrap=auto|none|preserve
                        --columns=NUMBER
                        --strip-comments
                        --toc, --table-of-contents
                        --toc-depth=NUMBER
                        --no-highlight
                        --highlight-style=STYLE|FILE
                        --syntax-definition=FILE
  -H FILE               --include-in-header=FILE
  -B FILE               --include-before-body=FILE
  -A FILE               --include-after-body=FILE
                        --resource-path=SEARCHPATH
                        --request-header=NAME:VALUE
                        --self-contained
                        --html-q-tags
                        --ascii
                        --reference-links
                        --reference-location=block|section|document
                        --atx-headers
                        --top-level-division=section|chapter|part
  -N                    --number-sections
                        --number-offset=NUMBERS
                        --listings
  -i                    --incremental
                        --slide-level=NUMBER
                        --section-divs
                        --default-image-extension=extension
                        --email-obfuscation=none|javascript|references
                        --id-prefix=STRING
  -T STRING             --title-prefix=STRING
  -c URL                --css=URL
                        --reference-doc=FILE
                        --epub-subdirectory=DIRNAME
                        --epub-cover-image=FILE
                        --epub-metadata=FILE
                        --epub-embed-font=FILE
                        --epub-chapter-level=NUMBER
                        --pdf-engine=PROGRAM
                        --pdf-engine-opt=STRING
                        --bibliography=FILE
                        --csl=FILE
                        --citation-abbreviations=FILE
                        --natbib
                        --biblatex
                        --mathml
                        --webtex[=URL]
                        --mathjax[=URL]
                        --katex[=URL]
                        --gladtex
                        --abbreviations=FILE
                        --trace
                        --dump-args
                        --ignore-args
                        --verbose
                        --quiet
                        --fail-if-warnings
                        --log=FILE
                        --bash-completion
                        --list-input-formats
                        --list-output-formats
                        --list-extensions[=FORMAT]
                        --list-highlight-languages
                        --list-highlight-styles
  -v                    --version
  -h                    --help

Compiling Markdown to HTML:

pandoc --standalone --from markdown --to html -o Out.html In.txt

Compiling this talk:

echo recompiling HTML
pandoc -f markdown+emoji -s --toc --toc-depth=2 --css gitlab.css -o Vortrag.html Vortrag.md
echo recompiling Reveal
pandoc -f markdown+emoji -s -t revealjs -o Vortrag_reveal.html -V revealjs-url=reveal.js-3.7.0 --slide-level 2 Vortrag.md

Templates

Variables filled into templates are taken from metadata or commandline-arguments.
They are referenced by $varname$ .
To check if a variable is set use $if(var)$ with $endif$ .
Iteration is also possible:

$for(var)$$var$$sep$, $endfor$

HTML Template:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="$lang$" xml:lang="$lang$"$if(dir)$ dir="$dir$"$endif$>
<head>
  <meta charset="utf-8" />
  <meta name="generator" content="pandoc" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
$for(author-meta)$
  <meta name="author" content="$author-meta$" />
$endfor$
$if(date-meta)$
  <meta name="dcterms.date" content="$date-meta$" />
$endif$
$if(keywords)$
  <meta name="keywords" content="$for(keywords)$$keywords$$sep$, $endfor$" />
$endif$
  <title>$if(title-prefix)$$title-prefix$ – $endif$$pagetitle$</title>
  <style type="text/css">
      code{white-space: pre-wrap;}
      span.smallcaps{font-variant: small-caps;}
      span.underline{text-decoration: underline;}
      div.column{display: inline-block; vertical-align: top; width: 50%;}
$if(quotes)$
      q { quotes: "“" "”" "‘" "’"; }
$endif$
  </style>
$if(highlighting-css)$
  <style type="text/css">
$highlighting-css$
  </style>
$endif$
$for(css)$
  <link rel="stylesheet" href="$css$" />
$endfor$
$if(math)$
  $math$
$endif$
$for(header-includes)$
  $header-includes$
$endfor$
</head>
<body>
$for(include-before)$
$include-before$
$endfor$
$if(title)$
<header>
<h1 class="title">$title$</h1>
$if(subtitle)$
<p class="subtitle">$subtitle$</p>
$endif$
$for(author)$
<p class="author">$author$</p>
$endfor$
$if(date)$
<p class="date">$date$</p>
$endif$
</header>
$endif$
$if(toc)$
<nav id="$idprefix$TOC">
$table-of-contents$
</nav>
$endif$
$body$
$for(include-after)$
$include-after$
$endfor$
</body> </html>

Extensions

Readers and writers allow for fine-grained customization by use of extensions.

Enabling emoji (👍) in markdown:

pandoc -f markdown+emoji in.md

Convert tables to multiline:

pandoc -f markdown -t markdown-grid_tables-simple_tables
                       +multiline_tables-pipe_tables

Filter

Filters allow for even more individualisation and actual extensibility by piping the native representation through a program which outputs the modified native code.

import Text.Pandoc.JSON

doInclude :: Block -> IO Block
doInclude cb@(CodeBlock (id, classes, namevals) contents) =
  case lookup "include" namevals of
       Just f     -> return . (CodeBlock (id, classes, namevals)) =<< readFile f
       Nothing    -> return cb
doInclude x = return x

main :: IO ()
main = toJSONFilter doInclude

~~~~ {include="README"}
this will be replaced by contents of README
~~~~

data Pandoc = Pandoc Meta [Block]

data Block
    = Plain [Inline]        
    | Para [Inline]         
    | LineBlock [[Inline]]  
    | CodeBlock Attr String 
    | RawBlock Format String 
    | BlockQuote [Block]    
    | OrderedList ListAttributes [[Block]] 
    | BulletList [[Block]]  
    | DefinitionList [([Inline],[[Block]])]  
    | Header Int Attr [Inline] 
    | HorizontalRule        
    | Table [Inline] [Alignment] [Double] [TableCell] [[TableCell]]  
    | Div Attr [Block]      
    | Null

data Inline
    = Str String            
    | Emph [Inline]         
    | Strong [Inline]       
    | Strikeout [Inline]    
    | Superscript [Inline]  
    | Subscript [Inline]    
    | SmallCaps [Inline]    
    | Quoted QuoteType [Inline] 
    | Cite [Citation]  [Inline] 
    | Code Attr String      
    | Space                 
    | SoftBreak             
    | LineBreak             
    | Math MathType String  
    | RawInline Format String 
    | Link Attr [Inline] Target  
    | Image Attr [Inline] Target 
    | Note [Block]          
    | Span Attr [Inline]

The Ecosystem

Slides

All available options do work, but each require individual, manual fixing.

Reveal.JS: Does work best out of the box, but uses lots of Javascript.
Impress.JS: Prezilike, seems straight forward.
Slideous/S5/Slidy/DZSlides/Power Point: … exist? 🙈

Static Site Generators ^🔗

Gitit: Git and Pandoc based wiki
Blogs/CVs: Easily DIY-able

Tooling ^🔗

pandoc-citeproc: Citation preprocessor for varoius bibliography formats and citation styles
Decker: A build and deployment tool for pandoc
Panrun: Simple wrapper script to insert multiple compilation configurations into a yaml block
Pandomatic/Panzer: Complex templated configurations for pandoc commands

Filters ^🔗

Mermaid-Filters: Flowcharts, sequence and gantt diagrams
CSV2Table: Inserts an external CSVs as a genuine table
R Pandoc: Automatically plots/compiles R code
ABC to Music ^🔗: Converts ABC input into actual music notation
TikZ ^🔗: Allows for raw TikZ input

Wrappers and Interfaces ^🔗

Python, Ruby, Scala, JavaScript, Perl, Pascal, C, R

Contributed Templates ^🔗

E-Book generation scripts, Journal, PhD theses, lecture notes, resume/CV, Bootstrap & HTML

Personal Verdict

Neither is Pandoc made to replace Latex (-Beamer), nor is it the painless solution to everything.

There are many annoyances:

Highlighting is often broken
Compilation does not always yield expected results
Tables are … terrible
Slideshow scaling has to be kept in mind

Pandoc should in my Opinion be used for:

Documentation
Larger text pipeline/processing
Fun with CSS and highly customized templates

Slides are possible and good for some poeple if you settle on one format.

Demo!

Relevant websites:

Some Note↩
also known as the “native format”↩