extending remark with custom pluginsa deep dive into markdown-to-html conversion with remark
minute read
As developers, we often find ourselves in situations where we need to extend or customize the functionality of existing tools to fit our specific needs or use case.
If you're like me, and recently decided to add Markdown support to your (new) blog (again), then maybe you're in one of those situations right now too.
The Need for Customization
My blog is built with Next.js, and I wanted to be able to write posts directly in my repository, in Markdown. As with most times I've worked with Markdown in the past, I reached for remark
as my Markdown processor. And as far as processing Markdown into HTML, it does the job well.
But I had two specific requirements, neither of which remark
could handle out of the box:
- Add unique ID attribute to all headings
- Wrap tables in a specific div and class
Adding unique id
s to headings is important for creating anchor links. This not only makes it easier for users to navigate the post but also helps with SEO, as important elements are more clearly identified and allow linking directly to specific sections. This has always been a subtle nicety that I've appreciated when reading (and particularly sharing) any sort of content on the web.
The table wrapper is necessary for styling. Okay, well it's not necessary, but it makes life a lot easier. In most cases you'll want your table to match the width of the content. And if there's a lot of content, you'll want the table to scroll horizontally. This is especially important for mobile devices, where the screen width is limited. Without that wrapper, making that happen is cumbersome. The wrapper makes it easy.
Building Custom Remark Plugins
So what's involved with building custom plugins for Remark? It's not as complicated as it might sound. Think of the pipeline that the Markdown travels on during it's journey from Markdown to HTML as a straight line from point A to point B. Plugins are like points along the line where you can add, remove, or modify the content.
As the Markdown travels from point A to point B, remark
first converts it into what's called an Abstract Syntax Tree (referred to as AST from hereon out). This is a tree structure where each node represents a part of the document, or eventual HTML.
With that in mind, our plugin will need to do three things:
- Traverse the AST, and find the nodes we're looking for
- Modify the AST, adding, removing, or modifying nodes according to our needs
- Return the modified AST, passing it back into the pipeline for the next plugin
With this context, let's take a closer look at the plugins I built for my blog.
Plugin 1: Adding IDs to Headings
The first plugin I created was for adding unique IDs to each heading in the Markdown content. As we know, this can be a useful feature for both users and SEO.
First, I needed to traverse the AST and find all the heading nodes. Although the recursive function might look a bit intimidating, it was a simple matter of checking the type
property of each node. If the type
is a heading
, then the text content is extracted and converted into a slug. The slug can then be assigned to the id
property of that same node.
export function remarkAddIdsToHeadings() {
return tree => {
const addIdToHeading = node => {
if (node.type === 'heading') {
const textContent = node.children.map(n => n.value).join('');
const slug = textContent
.toLowerCase()
.replace(/\s+/g, '-')
.replace(/[^\w\-]+/g, '');
node.data = node.data || {};
node.data.hProperties = node.data.hProperties || {};
node.data.hProperties.id = slug;
}
};
const traverseTree = nodes => {
nodes.forEach(node => {
addIdToHeading(node);
if (node.children) {
traverseTree(node.children);
}
});
};
traverseTree(tree.children);
};
}
But let's dig a bit further into what's happening in this plugin, especially in the recursive traverseTree
function.
- Near the last line, the
traverseTree
function is called with an argument oftree.children
. This is the root node of the AST, and it's an array of all the top-level nodes in the document. - The
traverseTree
function then iterates over each node in the array, using aforEach
loop. - In the loop, the function checks if the current iteration's node is a
heading
. If it is, it extracts the text content and generates a slug from it. If it's not, it does nothing. - If the current node has children of its own, then the
traverseTree
function is called again, this time with the current node's children as the argument. This is where the recursion happens. The function will continue to call itself until it reaches a node that has no children, and has moved all the way down the tree, thus processing all the data in the AST.
Plugin 2: Wrapping Tables in a Div
The second plugin was to wrap tables in a div
with a specific class for styling purposes. This was a bit more challenging as it involved manipulating the tree structure more significantly.
export function remarkWrapTables() {
return tree => {
const processNode = (node, index, parent) => {
if (node.type === 'table') {
const wrapper = {
type: 'div',
data: {
hName: 'div',
hProperties: { className: 'overflow-x-auto' },
},
children: [node],
};
parent.children.splice(index, 1, wrapper);
}
};
const traverseTree = (nodes, parent) => {
nodes.forEach((node, index) => {
if (node.children) {
traverseTree(node.children, node);
}
processNode(node, index, parent);
});
};
traverseTree(tree.children, tree);
};
}
Let's break down what's happening in the processNode
portion of this plugin.
- The function is called with three arguments: the current
node
(which we get from the recursivetraverseTree
function), theindex
of the current node in the parent node's children array, and theparent
node itself. - The function checks if the current node is a
table
. If it is, it creates a new node, which will be the wrapperdiv
element. This node has atype
ofdiv
, and achildren
array with the current node as the only item. - The function then replaces the current node in the parent node's children array with the new wrapper node.
- Since the wrapper node's children is actually the current node that the wrapper is 'replacing', the output is actually a wrapping of the current node.
Say that five times fast.
Implementing the Plugins
Okay. We built some cool shit, learned about Abstract Syntax Trees (ASTs), and are about to make our rendered Markdown content look oh-so-fresh. It's time to plug in the plugins.
Below is a simplified example of my Markdown-to-HTML pipeline, with some extra comments to better explain what's actually happening with each step.
import { remark } from 'remark';
import html from 'remark-html';
import gfm from 'remark-gfm';
import matter from 'gray-matter';
import { remarkWrapTables } from './customRemarkPlugins/remarkWrapTables';
import { remarkAddIdsToHeadings } from './customRemarkPlugins/remarkAddIdsToHeadings';
// Read the Markdown file's raw contents from the file system
const fullPath = path.join(postsDirectory, `${fileName}.md`);
const fileContents = fs.readFileSync(fullPath, 'utf8');
// Extract frontmatter (metadata) and Markdown content using gray-matter
const matterResult = matter(fileContents);
// Process the Markdown content with remark
const processedContent = await remark()
// At this point, remark converts the Markdown content into an AST (Abstract Syntax Tree)
.use(html) // Prepare the AST for HTML conversion
.use(gfm) // Apply GitHub-Flavored Markdown transformations on the AST
.use(remarkAddIdsToHeadings) // Custom plugin to manipulate the AST by adding IDs to headings
.use(remarkWrapTables) // Custom plugin to manipulate the AST by wrapping tables
.process(matterResult.content); // Process the Markdown content (AST) and convert it to an HTML string
const contentHtml = processedContent.toString(); // Final HTML content, ready for use
Keep in mind, if you try to replicate this example in Next.js, you'll need to use getStaticPaths
and getStaticProps
to generate the HTML content at build time.
Conclusion
These custom plugins not only solved a problem for me, but gave me a much welcomed chance to really dive into how remark
and plugins work behind the scenes. Although remark
was what I reached for first, this extensive flexibility ensures it's what I'll reach for next time too.