Developing a command line command to generate a Table of Contents (TOC) from Markdown.
Table of Contents
Markdown and TOCs
One of the first things I want to do every time I create a README.md
in GitHub, or any time I write a longish Markdown document is create a Table of Contents for it. It is a pain to navigate the document looking for all headers and then a) copy text, b) remove spaces and c) lowercase all letters. I started to think: how can I automate this?
After a Google Search, I discovered:
- Markdown to create pages and table of contents?: a Stack Overflow questions referencing tons of scripts that do the trick.
- Markdown in Python: this implementation contains a
[TOC]
extension.
Bash to the rescue
My feeling was that all this was probably overkill for generating a Table of Contents, so I tried to see if I could produce a bash one liner to generate the TOC. Note that this works in Mac OS X command line (which has different grep
and sed
than Linux).
I came up with the following (admittedly long) one liner:
grep -E "^#{1,5} " | sed -E 's/(#+) (.+)/\1:\2:\2/g' | awk -F ":" '{ gsub(/#/," ",$1); gsub(/[ ]/,"-",$3); print $1 "- [" $2 "](#" tolower($3) ")" }'
To test it I used the README.md
for the mysql node.js library (see below).
Scriptify it
One of the downsides of the script is that it brings all headings, all the way down to the fifth nested heading. Not good. You can directly modify the first part to do grep -E "^#{1,2} "
for example if you only want to go to the second heading. Or we can create a bash
script that receives as first and second arguments the levels we want (with suitable defaults):
#!/bin/bash
# It will read the input and:
#
# 1. Extract only the headers via grep (using $1 and $2 as first and last heading level)
# 2. Extract the header text via sed and created ':' separated records of the form '###:Full Text:Full Text'
# 3. Compose each TOC line via awk by replacing '#' with ' ' and stripping spaces and caps of reference
grep -E "^#{${1:-1},${2:-2}} " | \
sed -E 's/(#+) (.+)/\1:\2:\2/g' | \
awk -F ":" '{ gsub(/#/," ",$1); gsub(/[ ]/,"-",$3); print $1 "- [" $2 "](#" tolower($3) ")" }'
Tests
Against this document
This is how this document looks in GitHub’s repository view:
The results of running the script agains this very document are:
- [Markdown and TOCs](#markdown-and-tocs)
- [Bash to the rescue](#bash-to-the-rescue)
- [Scriptify it](#scriptify-it)
- [Tests](#tests)
Against longer document
Here I use README.md
for the mysql. It’s a long document with lots of headers. I run:
curl -s https://raw.githubusercontent.com/felixge/node-mysql/master/Readme.md | toc 2 2
I am happy to report that except for the self-referential first entry the results are identical to the actual TOC on the document:
- [Table of Contents](#table-of-contents)
- [Install](#install)
- [Introduction](#introduction)
- [Contributors](#contributors)
- [Sponsors](#sponsors)
- [Community](#community)
- [Establishing connections](#establishing-connections)
- [Connection options](#connection-options)
- [Terminating connections](#terminating-connections)
- [Pooling connections](#pooling-connections)
- [Pool options](#pool-options)
- [Pool events](#pool-events)
- [Closing all the connections in a pool](#closing-all-the-connections-in-a-pool)
- [PoolCluster](#poolcluster)
- [PoolCluster Option](#poolcluster-option)
- [Switching users and altering connection state](#switching-users-and-altering-connection-state)
- [Server disconnects](#server-disconnects)
- [Performing queries](#performing-queries)
- [Escaping query values](#escaping-query-values)
- [Escaping query identifiers](#escaping-query-identifiers)
- [Getting the id of an inserted row](#getting-the-id-of-an-inserted-row)
- [Getting the number of affected rows](#getting-the-number-of-affected-rows)
- [Getting the number of changed rows](#getting-the-number-of-changed-rows)
- [Getting the connection ID](#getting-the-connection-id)
- [Executing queries in parallel](#executing-queries-in-parallel)
- [Streaming query rows](#streaming-query-rows)
- [Multiple statement queries](#multiple-statement-queries)
- [Stored procedures](#stored-procedures)
- [Joins with overlapping column names](#joins-with-overlapping-column-names)
- [Transactions](#transactions)
- [Ping](#ping)
- [Timeouts](#timeouts)
- [Error handling](#error-handling)
- [Exception Safety](#exception-safety)
- [Type casting](#type-casting)
- [Connection Flags](#connection-flags)
- [Debugging and reporting problems](#debugging-and-reporting-problems)
- [Running tests](#running-tests)
- [Todo](#todo)