Searching
for a better markdown
27 July 2025
This document, and all pages on this site, are built out of Markdown, using the Haskell library Pandoc. Pandoc is a useful, very widely-used tool that can convert between a huge number of input and output formats. It does this mostly by distilling each of those formats into a least common denominator – which is essentially the format of markdown – and providing functions to read and write between each of those formats and this intermediate structure. If you ever want to add something beyond Markdown, then you have to write a custom reader and writer to support the feature you want.
I really like writing Markdown, because it’s pretty much just plain text. There is something deep inside me that refuses to add little bits of code in order to separate paragraphs. And since most things you write are plain text, that’s pretty good. But as I write more, I find myself wanting more and more extensions that would require an intimidating amount of code to support:
- I’d like to be able to check my articles for grammar and spelling mistakes, but there are a million different markdown “variants” and they all need a slightly different parser.
- Sometimes I deliberately use non-words like GNU or DRM which should never turn up as spelling mistakes. I would still like to be prompted to define those terms, and for those definitions to link up to the word automatically. Markdown does not provide rich enough syntax for this (you can hack it, but the hacks are fragile).
- It would be nice to process all my articles in order to find all the technical terms, to put them in a glossary. Or to find all the domains I link to and check they’re still alive.
There are other extensions I want, but these are enough to convince me that I want a system that is extensible, meaning every time I want new behaviour I can add some syntax to the format so it can be unambiguously specified, and machine-readable, so that whatever I end up writing it’s easy to write simple programs to process it. Strictly speaking, this includes formats like JSON and S-Expressions. But, like I said before, I refuse to write paragraph tags by hand. The world is simply too advanced for that. So my extra requirements are that the markup must be embedded in plain text, or close to it.
Note that I’m not against writing code at all. If I want some functionality nobody wrote before, I have to write code. But I want the process of writing an article to be as painless as possible – just declare the data, and let the engine take care of the rest. In fact, it would be even better if data that can be inferred (like last-revision dates, or authorship info) is not specified manually. But that’s a secondary requirement, hopefully easy to do if the language can be parsed easily.
Extensibility
The example of annotating deliberate spelling mistakes (using words that are not in the dictionary) evoked very strongly in my mind the idea of HTML. But HTML is not directly extensible (there are ways to extend it, by baking semantic information into certain attributes, but I consider those hacks). The extensible cousin of HTML is XML, which I never really considered. It has a very bad reputation, but let’s see how it fits my criteria:
- ✓ Extensible
- ✓ Machine readable
- ✓ Embedded in plaintext
- ❌ Requires stupid annotation of paragraphs
Unfortunately I really am a stickler for that last bullet point. Embedded in plain text is also not strictly true since some important special characters need escaping, like < and >. This exercise did force me to think a lot about possible uses of XML, and in particular I still think XML might be a very good output format, since it can represent arbitrary semantic data. Document schemas would also allow me to do some kind of integrity checks, basically the analogue of spellchecking for syntax. But I just cannot bring myself to write XML by hand.
There are also some problems that XML does not seem to try to solve or possibly cannot solve. For instance, one of the nice things I’d like to check is that every article I write is linked to from somewhere. This is the general problem of constructing a site map, but in my case I would like to just pull a summary of every article onto the homepage. I think this is impossible with just XML. Ideally I’d like to pull out the document title and a brief summary as well, but that would require XSLT, which I really don’t want to get into.
Programmability
So, I want to generate something like XML. This is actually pretty easy to do, and in Ruby (the lanaguage I’m most familiar with) there is a templating system that everyone uses for frameworks like Rails (“Embedded Ruby”, or ERB). An ERB template looks something like this:
<%= h1 "My blog article" %>
<%= p do %>
Some plain text content.
<% end %>
That’s great, lovely even, and it would be easy to write Ruby
functions to generate more sophisticated things. But this has a
limitation, one I noticed with Markdown but did not mention. ERB
is processed line-by-line, so you have a choice when you provide
an argument: provide it as a Ruby object (a string, or some more
complicated object that converts to a string) or a multiline
plaintext escape. This is quite similar to the Markdown
escape-hatch into HTML, in that it is one-way: Once you exit to
Ruby code, you cannot easily go back. For example, something
like an inline link which happens to contain a
<code>
element is very ugly to write.
The trouble with this is that the embedding is “shallow” - you can’t hop back and forth between plain text and Ruby as you see fit. Luckily, someone in LISP-land came up with a document language called variously Scribble/Skribe/Scribe, which extends the S-expression syntax, which would naively look like this:
(document"A new document")
(title "A tediously annotated thing with " (italic "manually") " escaped text")) (p
into something much more easy to read:
(document"A new document")
(title "auto-escaped") markup])) (p [A less tedious language with ,(italic
Perhaps this does not look too different to you. The way to
read it is that the [
/]
brackets
escape from lisp code into plain text, and ,
escapes back into Lisp. These can be nested arbitrarily, giving
a so-called “deep” embedding of code in the text. The extension
(called a “reader macro”) converts the nice form into the more
verbose form, which is actually deep down very similar to XML,
if you ever look at the output of a parser.
The reason this is an exciting difference is you can now hook into the processing step! You could write a hook which transforms every “leaf” (the plain text) to turn markdown syntax into explicit markup. It would be trivial to write a parser for the limited subset of Markdown I use most of the time (like double-lines indicating paragraph breaks). And since this is the source format, all the automatic processing can happen downstream on an easier-to-parse format, like HTML. And since it’s embedded in a general-purpose programming language, you can easily write code to e.g. scan the filesystem and generate an index page as I described earlier.
The various scribe flavours
Lisp, as a language family, tends to reinvent the wheel multiple times. There are multiple variants of Lisp, and it seems like the scribe pattern has been reinvented multiple times with some variations.
- There is scribe, skribe and exskribe for Common Lisp, which seem minimally maintained and a little fragmented.
- There is scribble, the Racket documentation tool.
- There is skribilo, which is written in Guile.
I was originally heavily drawn to scribble for two reasons. It has a really elegant way of hiding Lisp away from you, with its @-expression syntax.
#lang scribble/base
{On the Cookie-Eating Habits of Mice}
@title
If you give a mouse a cookie, he's going to ask for a glass of milk.
This is a complete document. That’s perfect! But I tried a few times to read the manual and I couldn’t really see how to make it fit my use-case. In particular I don’t see how templating works - I don’t understand how I could write a custom template for my blog, which is pretty much the first thing I would do.
Skribilo has the advantage that I know a little bit of Guile, and I know people in the community too. Writing my code in Guile might mean I can easily link into Guix, like Arun Isaac’s G-expression build tool. But I’m not sure how to get something quite as nice as Scribble’s at-expressions, yet.