Skip to content

Latest commit

 

History

History
350 lines (259 loc) · 13.5 KB

assorted-text.txt

File metadata and controls

350 lines (259 loc) · 13.5 KB

http://history-computer.com/Internet/Birth/Goldfarb.html How I'm going to split the world ================================

There are different sorts of markup, with different purposes, but I am going to concentrate on three main types, which I shall categorise as:

  1. Presentational markup

    This uses markup to direct the presentation of the text, for instance as a man page, on a screen, or on a typeset page.

    Traditional examples are the *roff family (roff, nroff, troff, groff and the various sorts of RUNOFF) and the |TeX| family (|TeX|, |LaTeX|, etc.)

  2. Semantic markup

    This uses markup to add meaning to the text, typically by annotating textual elements to say what they are (e.g., part number, book title, address).

    Traditional examples are the markups described by SGML (or, lighter weight, XML), including Docbook.

  3. Readable plaintext

    This attempts to get some of the benefits of one of the other types (typically concentrating on presentation) whilst preserving the readability of the original text. Interestingly, these have been around about as long as the other forms, but are less talked about.

    The ur-language for this form is probably setext, but obvious examples include StructuredText, reStructuredText and markdown.

    The art of design of these markups is judging what capabilities are wanted - the more things the markup can represent, the more complex the restrictions on what may be typed as free text, and what has accidental meaning.

    Note

    The classic aim is to be "close to what one would write in an email" - a stricture that meant more when emails could only contain ASCII text.

    Note

    I think this is why some people object to reStructuredText but are happy with markdown. reStructuredText aims to provide a lot of capability (such that one rarely needs to stray into something like |TeX| or Docbook), whereas markdown puts that balance a lot closer to plain text.

Note

I think that means it is also sensible to separate out the three timelines. There will obviously have been cross-fertilisation, but it's probably much simpler not to mix them up, because that can lead to an artificial expectation of causation across the timelines, which I am suspicious of.

(I'm not convinced the roffs and the SGMLs were well related in the early days, for instance. And that's irrespective of whether the different developers knew of each others work.)

Presentational markup

The early days

This started with output to teletype (underlining, bolding, and otherwise overstruct characters) and to line printers. Eventually, output to monotype/linotype/etc. got added in.

For instance: nroff/troff, DSR (Digital Standard Runnoff)

The need was to type basic alphanumerics and symbols (i.e., ASCII or EBCDIC) but output to something with the ability to represent more. For teletypes, this might just mean the use of the backspace character to allow overwriting text - but doing that in the original text file would not necessarily be portable or readable.

Needs:

  • Use portable character sets (not necessarily only ASCII and EBCDIC!)
  • Don't want to type in the "magic codes" to do unerlining, etc., especially as they're not necessarily going to be the same codes for different output devices.

Programmable markup

Note

Wikipedia calls this "Procedural markup"

There is an important subset of presentation markup, which is actually a progamming language that privides markup. The obvious examples are |TeX| and Postscript (and to a lesser extent, PDF).

|TeX| is essentially a macro expansion language, and this means that the (perhaps more familiar) |LaTeX| is written in |TeX| itself.

Postscript is perhaps not normally thought of as a markup lanugage, but is essentially a Forth derivative that works on text to produce a printable output.

As such, both of these languages can be used to do non-text processing as well, although perhaps not in a manner that feels natural (to their intent).

PDF incorporates a subset of Postscript, but is much more page oriented (pages are independent) and less general in its applicability, so is arguably not quite in our area of interest.

http://wiki.c2.com/?ForthPostscriptRelationship discusses whether Postscript is a Forth, or is just similar to Forth (basically, the latter seems more sensible).

Semantic markup

Note

Wikipedia calls this Descriptive markup

  • SGML (and DTDS)

    leading to:

    • HTML
    • XML
    • XHTML
    • Docbook

    and so on.

(SGML originally derived from GML)

Readable plaintext

Note

Wikipedia seems to put these together with such things as wiki markup as Lightweight markup. I'd argue there's a difference between lightweight markup and the subset therein which is readable, and it's that latter subset I'm most interested in.

Note

It would be nice to get an actual timeline from setext to structured text to reStructuredText and any other intermediaries.

setext -> structured text

The big ideas of reStructuredText:

  1. prioritise readability of the source text
  2. not to specify the form of the output (i.e., don't just assume HTML)
  3. be well specified

Other examples:

  • markdown (I'd argue less readable, because it's meant to be slightly easier to write, and it originally was designed for output to HTML, and it's definitely not well specified)
  • asciidoctor (how does this differ on those three axes?)

Talking points for the slideshow

"Why markup languages are older than you think, and some of the major examples"

All four have different reasons for existing, but clearly influence each other.

So, for each pick a major example - perhaps:

  • nroff/troff (different programs, but same input format - does it have a name?)
  • SGML/HTML/XML and maybe a brief nod to Docbook
  • |TeX|/|LaTeX| (more people use |LaTeX| than bare |TeX|)
  • setext -> structured text -> reStructuredText

Want dates for each

Driving forces:

  • I want portable documentation
  • I want good (but controllable) typesetting
  • I want to mark up the meaning of the elements of my text, for analysis
  • I want readable source

Older fragments of a Timeline

need to put in setext, markdown, nroff/troff/groff, RUNOFF

http://h20565.www2.hpe.com/hpsc/doc/public/display?docId=emr_na-c04623260 is the OpenVMS Digital Standard Runoff Reference Manual from May 1993.

and

  • 1967 - GenCode, William W. Tunnicliffe ("generic coding") - publishing industry.
  • 1969 - GML, Charles Goldfarb
  • 1974 (is that date right?) - SGML
  • 1978 ?? - TeX
  • 1980 - Scribe, Brian Reid, doctoral thesis
  • 1984 ?? - LaTeX
  • 1986 - SGML ISO Standard ISO 8879
  • 1989 ?? - HTML
  • setext - introduced 1991
  • 1996 - XML
  • StructuredText - introduced through Zope
  • reStructuredText -
  • MathML -

Mumblings

These may or may not get used.

Readability versus writeability

There is an obvious tradeoff to be made between how readable a format is, and how simple it is to write. For instance, delimiting headers by leading characters:

# Header 1
## Header 2

is much simpler than having to type underlines:

Header 1
========

Header 2
--------

Also, having a pre-defined set of underlines (e.g., === always means title, --- means subtitle, etc.) is easier to learn and easier to use than allowing any underlining choice (provided it is consistent within a document), but may be considered to remove the author's choice on which form of underlining reads best in this particular document.

As in so many things, the Zen of Python still has applicability - it is then left up to the reader how well it has been followed.

The advantages of having a competent specification

It is normally regarded as a good thing to have multiple implementations of a specification - not least because it helps to iron out misunderstandings of what that specification means. Standardisation can help to mediate that understanding (although not always as much as one might hope), and Python gets by quite well by just having people communicate a lot and having a reasonable test suite for the language.

It's not always an obviously good thing, though.

There are many forms of markdown, but the original implementation of markdown is essentially frozen, as is the original documentation, and that "definition" of markdown is both ambiguous, and does not address various tasks that people want to do. Nor is the original author willing to help with this situation [3]. This means that different markdown implementations provide their own decisions on the ambiguous parts, and provide their own extensions. Unfortunately this means that markdown text is not necessarily portable between implementations. In practice, however, this probably doesn't matter much, as the use of markdown is often for documentation that belongs to a particular site or user environment, and interoperability within that site/environment is enough.

In contrast, reStructuredText is quite well specified (David Goodger having a background in SGML systems, after all). This means that the various implementations of reStructuredText can be specific about what they do or don't support, and in general interoperability should be better, or at least more predictable.

Incidentally, it probably also makes it possible to produce a general linter for reStructuredText - i.e., a program to inspect the text for errors before running it through docutils to produce output - which is harder to do in a portable manner for markdown, because there are so many markdowns.

[1]both not escaped in this text, of course
[2]the answer is, of course, "whichever is suitable" / "whichever you choose", although I would suggest that for a large public project (gov.uk, documentation for the RaspberryPi) markdown should be adopted, as it is simpler, whilst for more challenging uses (or people who prefer more challenging markup), reStructuredText is good. And as reStructuredText suggests that "if you need to do things it doesn't support, use something else", then I think the same can apply to markdown and (perhaps) moving on to reStructuredText.
[3]whilst I personally find that hard to understand, it's not as if we're paying anything for the privilege of using markdown, we're using something given freely as it is/was.

Comparisons

Comparing markdown, reStructuredText and AsciiDoc (to pick three).

Is this section worth anything? I'm not actually convinced.

NB: check whether AsciiDoctor also always goes through docbook

Concept markdown reStructuredText AsciiDoc
readability a main aim the main aim a main aim
closely specified no yes yes
output to various various docbook and then various
inline HTML yes delimited [4] delimited [5]
nested inline markup ? no yes
non-trivial list items no yes yes
extensible no directives macros
conditional text no no no
executable text no no [6] yes
tables not standard yes yes
[4]reStructuredText allows the writer to add HTML via a directive, but it will only be used if the output is to HTML.
[5]AsciiDoc produces HTML via Docbook, and Docbook provides a way of including a file of raw HTML into the HTML output.
[6]this is a very conscious decision by reStructuredText

On lists

Or, actually, treating text blocks as composable first class entities.

Why do so many markup creators believe that lists are just made up of list items with no internal structure? Do they really never want to put code into list items, or have more than one paragraph per item? Given the experience of the lengths people will go to in those wiki formats that are similarly crippled to make it look as if one can do these things, this would appear always to be a mistake. Back in the origianl wikiwikiweb, I think that it was quite deliberate - if one looks at the types of page being written in that wiki, there was no intent to have anything beyond a sort of "notation" page - it wasn't intended for writing whole documents. For other wikis, I suspect many have copied that limitation without thinking about the implications. For actual markup formats, though (expecially those targetting HTML, which is many of them), it's rather harder to understand the limitation.