http://history-computer.com/Internet/Birth/Goldfarb.html How I'm going to split the world ================================
There are different sorts of markup, with different purposes, but I am going to concentrate on three main types, which I shall categorise as:
Presentational markup
This uses markup to direct the presentation of the text, for instance as a man page, on a screen, or on a typeset page.
Traditional examples are the
*roff
family (roff, nroff, troff, groff and the various sorts of RUNOFF) and the |TeX| family (|TeX|, |LaTeX|, etc.)Semantic markup
This uses markup to add meaning to the text, typically by annotating textual elements to say what they are (e.g., part number, book title, address).
Traditional examples are the markups described by SGML (or, lighter weight, XML), including Docbook.
Readable plaintext
This attempts to get some of the benefits of one of the other types (typically concentrating on presentation) whilst preserving the readability of the original text. Interestingly, these have been around about as long as the other forms, but are less talked about.
The ur-language for this form is probably setext, but obvious examples include StructuredText, reStructuredText and markdown.
The art of design of these markups is judging what capabilities are wanted - the more things the markup can represent, the more complex the restrictions on what may be typed as free text, and what has accidental meaning.
Note
The classic aim is to be "close to what one would write in an email" - a stricture that meant more when emails could only contain ASCII text.
Note
I think this is why some people object to reStructuredText but are happy with markdown. reStructuredText aims to provide a lot of capability (such that one rarely needs to stray into something like |TeX| or Docbook), whereas markdown puts that balance a lot closer to plain text.
Note
I think that means it is also sensible to separate out the three timelines. There will obviously have been cross-fertilisation, but it's probably much simpler not to mix them up, because that can lead to an artificial expectation of causation across the timelines, which I am suspicious of.
(I'm not convinced the roffs and the SGMLs were well related in the early days, for instance. And that's irrespective of whether the different developers knew of each others work.)
This started with output to teletype (underlining, bolding, and otherwise overstruct characters) and to line printers. Eventually, output to monotype/linotype/etc. got added in.
For instance: nroff/troff, DSR (Digital Standard Runnoff)
The need was to type basic alphanumerics and symbols (i.e., ASCII or EBCDIC) but output to something with the ability to represent more. For teletypes, this might just mean the use of the backspace character to allow overwriting text - but doing that in the original text file would not necessarily be portable or readable.
Needs:
- Use portable character sets (not necessarily only ASCII and EBCDIC!)
- Don't want to type in the "magic codes" to do unerlining, etc., especially as they're not necessarily going to be the same codes for different output devices.
Note
Wikipedia calls this "Procedural markup"
There is an important subset of presentation markup, which is actually a progamming language that privides markup. The obvious examples are |TeX| and Postscript (and to a lesser extent, PDF).
|TeX| is essentially a macro expansion language, and this means that the (perhaps more familiar) |LaTeX| is written in |TeX| itself.
Postscript is perhaps not normally thought of as a markup lanugage, but is essentially a Forth derivative that works on text to produce a printable output.
As such, both of these languages can be used to do non-text processing as well, although perhaps not in a manner that feels natural (to their intent).
PDF incorporates a subset of Postscript, but is much more page oriented (pages are independent) and less general in its applicability, so is arguably not quite in our area of interest.
http://wiki.c2.com/?ForthPostscriptRelationship discusses whether Postscript is a Forth, or is just similar to Forth (basically, the latter seems more sensible).
Note
Wikipedia calls this Descriptive markup
SGML (and DTDS)
leading to:
- HTML
- XML
- XHTML
- Docbook
and so on.
(SGML originally derived from GML)
Note
Wikipedia seems to put these together with such things as wiki markup as Lightweight markup. I'd argue there's a difference between lightweight markup and the subset therein which is readable, and it's that latter subset I'm most interested in.
Note
It would be nice to get an actual timeline from setext to structured text to reStructuredText and any other intermediaries.
setext -> structured text
The big ideas of reStructuredText:
- prioritise readability of the source text
- not to specify the form of the output (i.e., don't just assume HTML)
- be well specified
Other examples:
- markdown (I'd argue less readable, because it's meant to be slightly easier to write, and it originally was designed for output to HTML, and it's definitely not well specified)
- asciidoctor (how does this differ on those three axes?)
"Why markup languages are older than you think, and some of the major examples"
All four have different reasons for existing, but clearly influence each other.
So, for each pick a major example - perhaps:
- nroff/troff (different programs, but same input format - does it have a name?)
- SGML/HTML/XML and maybe a brief nod to Docbook
- |TeX|/|LaTeX| (more people use |LaTeX| than bare |TeX|)
- setext -> structured text -> reStructuredText
Want dates for each
Driving forces:
- I want portable documentation
- I want good (but controllable) typesetting
- I want to mark up the meaning of the elements of my text, for analysis
- I want readable source
need to put in setext, markdown, nroff/troff/groff, RUNOFF
- 1964 RUNOFF https://en.wikipedia.org/wiki/TYPSET_and_RUNOFF
- also, the RUNOFF program (wikipedia - Runoff)
- 1969 roff
- nroff (newer roff) https://en.wikipedia.org/wiki/Nroff
- troff (typesetter roff) https://en.wikipedia.org/wiki/Troff
- in various versions, and with increasing capabilities. nroff basically ignores what it doesn't understand when reading the same input.
- 1990s groff (GNU roff)
http://h20565.www2.hpe.com/hpsc/doc/public/display?docId=emr_na-c04623260 is the OpenVMS Digital Standard Runoff Reference Manual from May 1993.
and
- 1967 - GenCode, William W. Tunnicliffe ("generic coding") - publishing industry.
- 1969 - GML, Charles Goldfarb
- 1974 (is that date right?) - SGML
- 1978 ?? - TeX
- 1980 - Scribe, Brian Reid, doctoral thesis
- 1984 ?? - LaTeX
- 1986 - SGML ISO Standard ISO 8879
- 1989 ?? - HTML
- setext - introduced 1991
- 1996 - XML
- StructuredText - introduced through Zope
- reStructuredText -
- MathML -
These may or may not get used.
There is an obvious tradeoff to be made between how readable a format is, and how simple it is to write. For instance, delimiting headers by leading characters:
# Header 1 ## Header 2
is much simpler than having to type underlines:
Header 1 ======== Header 2 --------
Also, having a pre-defined set of underlines (e.g., ===
always means title,
---
means subtitle, etc.) is easier to learn and easier to use than
allowing any underlining choice (provided it is consistent within a document),
but may be considered to remove the author's choice on which form of
underlining reads best in this particular document.
As in so many things, the Zen of Python still has applicability - it is then left up to the reader how well it has been followed.
It is normally regarded as a good thing to have multiple implementations of a specification - not least because it helps to iron out misunderstandings of what that specification means. Standardisation can help to mediate that understanding (although not always as much as one might hope), and Python gets by quite well by just having people communicate a lot and having a reasonable test suite for the language.
It's not always an obviously good thing, though.
There are many forms of markdown, but the original implementation of markdown is essentially frozen, as is the original documentation, and that "definition" of markdown is both ambiguous, and does not address various tasks that people want to do. Nor is the original author willing to help with this situation [3]. This means that different markdown implementations provide their own decisions on the ambiguous parts, and provide their own extensions. Unfortunately this means that markdown text is not necessarily portable between implementations. In practice, however, this probably doesn't matter much, as the use of markdown is often for documentation that belongs to a particular site or user environment, and interoperability within that site/environment is enough.
In contrast, reStructuredText is quite well specified (David Goodger having a background in SGML systems, after all). This means that the various implementations of reStructuredText can be specific about what they do or don't support, and in general interoperability should be better, or at least more predictable.
Incidentally, it probably also makes it possible to produce a general linter for reStructuredText - i.e., a program to inspect the text for errors before running it through docutils to produce output - which is harder to do in a portable manner for markdown, because there are so many markdowns.
[1] | both not escaped in this text, of course |
[2] | the answer is, of course, "whichever is suitable" / "whichever you choose", although I would suggest that for a large public project (gov.uk, documentation for the RaspberryPi) markdown should be adopted, as it is simpler, whilst for more challenging uses (or people who prefer more challenging markup), reStructuredText is good. And as reStructuredText suggests that "if you need to do things it doesn't support, use something else", then I think the same can apply to markdown and (perhaps) moving on to reStructuredText. |
[3] | whilst I personally find that hard to understand, it's not as if we're paying anything for the privilege of using markdown, we're using something given freely as it is/was. |
Comparing markdown, reStructuredText and AsciiDoc (to pick three).
Is this section worth anything? I'm not actually convinced.
NB: check whether AsciiDoctor also always goes through docbook
Concept markdown reStructuredText AsciiDoc readability a main aim the main aim a main aim closely specified no yes yes output to various various docbook and then various inline HTML yes delimited [4] delimited [5] nested inline markup ? no yes non-trivial list items no yes yes extensible no directives macros conditional text no no no executable text no no [6] yes tables not standard yes yes
[4] | reStructuredText allows the writer to add HTML via a directive, but it will only be used if the output is to HTML. |
[5] | AsciiDoc produces HTML via Docbook, and Docbook provides a way of including a file of raw HTML into the HTML output. |
[6] | this is a very conscious decision by reStructuredText |
Or, actually, treating text blocks as composable first class entities.
Why do so many markup creators believe that lists are just made up of list items with no internal structure? Do they really never want to put code into list items, or have more than one paragraph per item? Given the experience of the lengths people will go to in those wiki formats that are similarly crippled to make it look as if one can do these things, this would appear always to be a mistake. Back in the origianl wikiwikiweb, I think that it was quite deliberate - if one looks at the types of page being written in that wiki, there was no intent to have anything beyond a sort of "notation" page - it wasn't intended for writing whole documents. For other wikis, I suspect many have copied that limitation without thinking about the implications. For actual markup formats, though (expecially those targetting HTML, which is many of them), it's rather harder to understand the limitation.