Writing a custom markup format
16 May 2025
Introduction
When writing the backend for this very blog, I wanted to be able to write articles in a simple markup format similar to other blogging sites.
Why no Markdown?
There are a few reasons on why I opted to choose my own custom solution instead of choosing Markdown. A few reasons are:
- No existing suitable Markdown library
- Does not encompass every feature I want
- Ambiguous syntax
- It seemed fun!
No existing suitable Markdown library
My HTTP Server is written in Jai, so my backend is too. Currently there is no existing Markdown library for Jai.
I wanted this blog to be server-side rendered, and I didn't want to have the Markdown parser in a separate process.
Does not encompass every feature I want
Markdown almost supports every feature I want. The big one that I wanted on top of that was videos. I did not want to support inline HTML, and I also didn't really want to create yet another Markdown flavour.
Ambiguous syntax
I briefly looked into writing my own Markdown parser in Jai. While it definitely seems doable, some of the ambiguous syntax makes it very annoying to parse.
Some annoyances are:
- Multiple header syntax
-
_and*are both for bold and italic - Whitespace dependencies (line breaks, lists, etc)
If I was going to write a parser anyway, I'd want it to be as simple as possible.
It seemed fun!
It just seemed more fun to homebrew a format that I would have control over instead of trying to fit something to a spec. Sometimes fun is enough reason in and of itself.
The .art format
The format I ended up making is just referred to as .art, not because it is a piece of art, it's simply short for Article.
The format is similar to Markdown in many ways, but crucially, you should be able to tell what kind of node corresponds to a line of input by just the first 2 characters.
| Starting characters | Node type |
|---|---|
: |
Header |
# |
Comment (not rendered) |
`` |
Code Block |
! |
Image |
@ |
Video |
* |
Unordered List |
. |
Ordered List |
- |
Separator |
| |
Table |
> |
Caption |
| Other | Paragraph |
The parser goes line-by-line over the input match with a simple switch-case over the first character. In some cases it has to check the second character like with ` (inline code) versus ``` (code block).
Each of the above types has an associated node type, which looks something like the following:
Text_Style :: enum_flags {
NONE :: 0;
BOLD :: 1 << 0;
ITALICS :: 1 << 1;
UNDERLINE :: 1 << 2;
STRIKETHROUGH :: 1 << 3;
CODE :: 1 << 4;
}
Node_Kind :: enum {
UNKNOWN;
HEADER;
PARAGRAPH;
TEXT;
HYPERLINK;
CODE_BLOCK;
IMAGE;
LIST_ITEM;
VIDEO;
UNORDERED_LIST;
ORDERED_LIST;
SEPARATOR;
TABLE;
CAPTION;
}
Text_Line :: #type [..] Text_Node;
Markup_Node :: struct {
kind := Node_Kind.UNKNOWN;
}
Header_Node :: struct {
using #as node: Markup_Node;
kind = .HEADER;
level: int;
contents: Text_Line;
}
Paragraph_Node :: struct {
using #as node: Markup_Node;
kind = .PARAGRAPH;
lines: [..] Text_Line;
}
Text_Node :: struct {
using #as node: Markup_Node;
kind = .TEXT;
text: string;
url: string; // Only used if kind == .HYPERLINK
style: Text_Style;
}
// ...
Our article is then just an array of Markup_Node's. It's a linear list, there is no need for a tree, which makes parsing quite simple.
Anything that is not parsed as one of the node types from the table is parsed as a Paragraph_Node. For the text node, there are different styles, which are defined as the following:
| Syntax | Style |
|---|---|
*bold* |
bold |
_italics_ |
italics |
^underline^ |
underline |
~strikethrough~ |
|
`code` |
code |
[hyperlink](url) |
hyperlink |
Styling nodes
Turning this into HTML is quite straightforward, but just having plain HTML is not very useful. I don't want to hardcode what CSS classes it uses, so what is the solution?
When rendering the article to HTML, there is an Article_Render_Options struct, which is defined as the following:
Article_Render_Options :: struct {
generate_header_anchors := true;
indent := 2;
indent_step := 2;
styles: [] Node_Style;
}
Node_Style :: struct {
kind := Style_Kind.UNKNOWN;
classes: [..] string;
}
Each Style_Kind is a tag that we can apply styles to. There are quite a lot of entries so to just highlight some interesting ones:
-
HEADER_1..HEADER_6, of course we want to style each header level individually -
(UN)ORDERED_LIST_PARENT, since lists can be nested we want to be able to specify the top-level style as well -
TABLE_TR_EVEN/ODD, to be able to specify alternating colors for the rows
I am defining the styles in a .json file and then deserializing them to the Node_Style array.
{
"kind": "HYPERLINK",
"classes": [
"text-orange-100",
"underline",
"bg-slate-700",
"p-1",
"rounded",
"font-semibold",
"hover:text-orange-50",
"hover:bg-slate-600",
"decoration-0",
"decoration-dotted"
]
}
This makes it quite easy to adjust the rendering just for the article itself without accidentally making changes elsewhere on the site.
That's it!
And that's pretty much the whole system! It's quite pleasant to write in and to be honest I prefer it over Markdown now, though in the end they're not that different.