building epic-remarka journey through the world of markdown-to-html processing

14 minute read

A few weeks ago, I built some custom remark plugins for my blog. The plugins took care of some basic tasks like auto-generating IDs for headings and adding custom wrappers for certain HTML elements. While these plugins were useful, I found they still didn't cover all of what I wanted to achieve.

This led me down a rabbit hole of exploring remark and its ecosystem, building a few plugins, then a few more. Eventually epic-remark was born - a feature rich, extendable, Markdown-to-HTML conversion tool.

Writing for readers, not machines

I love blogging. It's a great way to unpack complex concepts, share something that might help someone else, and further cement the understanding in my own mind. However, the process of writing and publishing blog posts can be tedious, especially when it comes to formatting and styling content. This is where Markdown comes in.

But, despite Markdown's common presence in the world of web development, the process of converting it into HTML is often fraught with limitations, especially when ~~advanced~~ basic features (like tables or embeds) are required. I found myself rewriting my content to work within the limitations of my blogging engine. I was writing for a machine instead of a reader. Not good.

From Plugin to Package

If you read my last post, then you already know how this journey began. It quickly evolved from a simple task to a deep exploration of Markdown's (and remark's) inner workings. And, just as quickly, it became clear that there was an opportunity to simplify this process for others too. This gap in functionality was the catalyst for epic-remark.

Deep Diving into ASTs

Unraveling the Core of Markdown Transformation

Building custom plugins required an intimate understanding of how Markdown is parsed and transformed. It turns out that Markdown processing, like a lot of other code interpretation, leans on the concept of Abstract Syntax Trees (ASTs).

Abstract Syntax Trees are the unsung heroes of code interpretation. And while you might not be terribly familiar with ASTs, you've almost certainly used them, even if indirectly. They're in the background, working behind the scenes in many of today's most popular tools and frameworks.

In programming terms, an AST is a structural map, usually representing source code. In simpler terms, think of it as a family tree, where each branch and leaf represents a different component of your code. In the case of Markdown, these nodes represent various elements like headings, paragraphs, and links.

Consider Markdown as a human-readable format, and ASTs are like bridge to a machine-readable version of that same content. Many popular tools use ASTs because they provide a reliable way to read and manipulate the data, in a familiar tree format. It is of course possible to modify the Markdown data directly (instead of converting to an AST to be modified), however a structure without nodes makes it more challenging to do so with reliable results. Furthermore, the JSON format of ASTs provides a much more performant and intuitive structure for developers.

But since our end goal is to output some HTML, we can take our AST conversion a step further, and convert it into an HAST.

HAST is an acronym for Hypertext Abstract Syntax Tree, which, you guess it, is the Hypertext version of our original AST; which boils down to a fancy way of saying the HTML version of the AST.

Although we end up with an HAST that is similar to the AST, both of which can be read and modified, there is a key difference that influenced the decision to modify the content in its HAST format instead of its AST format. The HAST is more closely aligned with HTML's structure, making it easier to identify and modify specific elements. Again, it is possible to modify the AST directly instead (just like we could modify the Markdown directly), but the non-HTML structure makes it more challenging to do so with reliable results.

Navigating and Manipulating (H)ASTs

Traversing an AST is a systematic process, but thankfully, it's code you'll only need to write once. Your syntax tree will contain different content from run to run, but the structure of the tree will remain the same. This means that the code you write to traverse the tree will be reusable, regardless of the content.

The process of traversing an AST or HAST involves recursively iterating through each node in the tree, checking its type, and then performing the necessary actions: either modify the node, or do nothing; both cases then move onto the next node. This process is repeated until all nodes have been evaluated.

If you don't want to write your own function, then there are several lightweight libraries that can be used to traverse the AST or HAST. However, in the effort of adding as few dependencies to epic-remark as possible, I decided use vanilla JavaScript and write my own tree traversal functions, pictured below.

export default function visit(node, type, callback) {
  if (node.type === type) {
    callback(node);
  }

  if (node.children) {
    node.children.forEach(child => {
      visit(child, type, callback);
    });
  }
}

The Broader Role of ASTs in Web Development

Earlier I briefly mentioned that ASTs are used in many of your favourite tools and frameworks. Let's revisit that.

In web development, particularly with frameworks like React, ASTs are often used to transform JSX (React's syntax extension) into standard JavaScript. This transformation is crucial for React components to be understood by browsers. Tools like Babel parse JSX into an AST, manipulate the AST to convert JSX syntax, and then generate standard JavaScript code that browsers can execute.

So the next time you're writing some code, and marvel at how such expressive and readable code can be compiled down to a minified, universally compatible bundle, you can thank ASTs.

But it isn't just React and Babel using ASTs. Tools like ESLint also take advantage of ASTs too. When ESLint runs, it creates its own AST of the code, then evaluates the code's structure and makes modifications or prepares warnings to enforces coding standards.

Understanding ASTs in these contexts reveals their significance in optimizing code performance, ensuring code quality, and improving software development processes across various JavaScript-based applications.

Exploring ASTs and HASTs in the development of epic-remark has been a fascinating journey. At the end of the day, it’s about more than just handling code; it’s about understanding the underlying structure of how we communicate instructions to machines. And with such prolific use of ASTs across web development, it’s clear that this concept is one worth understanding, even at a high level.

Fostering a Community-Driven Ecosystem

After deciding to commit to building epic-remark, it was all about meshing open-source ethos with practical utility.

From there, the vision was simple ('simple'): create a tool that serves not just my needs but also those of the broader dev community. It should be easy to use, and easy to build upon.

Although the core epic-remark package has always been framework-agnostic and easily extensible, the setup was still a hassle.

For example, you likely need to first write a function to read the Markdown files from your project directories. Then you need to pass that content into epic-remark during your site's build process, which then returns the HTML to be rendered.

Even though epic-remark is doing all the heavy lifting when it comes to processing the actual content, it can still be a pain to get the content into the processor in the first place.

When users sit down to create a blog, my guess is that they've got some ideas and are ready to start writing. Ready to start building. Developers want to focus on what users will see, what will have a tangible impact for their client, not on the underlying monotony of the mechanics that got it there.

I tackled this problem with two solutions, which play nicely on one another:

framework specific examples (Next.js, Nuxt.js, etc.)
a dedicated CLI (Command Line Interface)

Framework specific examples

Although epic-remark is framework-agnostic, some framework-specific examples are included in the repo to get the user started. They're intended as a springboard that can be used as is (or nearly as is at the time of this writing), but remain flexible enough to be molded to fit individual needs. You might notice that some of the code and components in the examples are a little verbose, contain minimal styling, and aren't split into separate smaller components as frequently as you might expect. This is intentional. It makes the code a bit easier to understand, and a lot easier to remove if you don't need it.

Furthermore, this practical starting point makes it easier for the community to collaborate and contribute to the project, since there are already some common best practices in place.

Dedicated CLI

I built epic-remark because I got tired of setting the same thing up over and over (you know what they say about insanity). Setting up an accessible blog should be easy, and importantly, fast. We're here to write content, not reinvent the ~~wheel~~ Markdown engine each time.

I considered how else I could streamline the development process for users (and myself. Hey, I'm not selfish. I built this tool because I needed it!), and the CLI quickly became an integral part of the vision.

Building the CLI

Building the epic-remark CLI was a lot of fun, and a bit of a challenge. But maybe not a challenge in the way you think.

I wanted the CLI to be easy to use, even by those who are less familiar with the command line. To achieve this, I used commander and inquirer to create a series of prompts that guide the user through the process of creating a new project. If the user wants to initiate the prompts and create a new project, they can simply run the following command:

npx create-epic-remark

When you're new to the command line, flags are intimidating. You've lived your life in a GUI, and adapting to the Terminal is a lot to take in. And there lies the complexity. Not in the CLI's code itself, but in the interaction design.

But this is good complexity. Yes, there is such thing as good and bad complexity.

Good complexity, like we have here, introduces complexity for one party (me, and any other contributors) so that another party (the user) can have a much simpler experience. Abstracting complexity is the reason software exists. So let's lean in.

In web design, we take the tools we have to interact with the user for granted. Buttons and links. Modals, animations. Back and forward buttons. The list goes on.

But the CLI is, pretty much, just a line of text.

It can receive input and conditionally output text or take action. But for the most part, the prompt questions need to remain simple. And while that sounds straightforward, how your software gets used can be unpredictable. Will the instructions get read? Will they know what to do next? Will they press enter through each screen too quickly and cause unintended behaviour?

You need to tailor and somewhat safeguard the experience, with as little input (and output!) as possible. It took some thoughtful consideration, and a little trial and error, but I eventually settled on a design that strikes a fair balance between usability and simplicity. Shout out to the OG chalk for making the output a little easier on the eyes.

All that said, it would be wrong to leave power users out of the equation. That's why the CLI also accepts flags, which allow users to skip the prompts and create a project from the chosen example with a single command.

npx create-epic-remark --example next-tailwind

At the time of this writing, there are two examples available to the CLI: epic-remark with Next.js and epic-remark with Nuxt.js. These examples are fully functional, and can be used as a starting point for your own blog today.

Conclusion

epic-remark was a fun project to build. It's a handy tool that I hope others find useful, and one that I'll continue to use for my own blogs moving forward. Most of all, I'm excited to see how it evolves with community input. I've already got a few ideas for new features, and am sure the community will have some great ones too.