R@ndom Curve

One liner to generate a Markdown TOC
Andres C. Rodriguez 2015-09-25

Developing a command line command to generate a Table of Contents (TOC) from Markdown.

Table of Contents

Markdown and TOCs

One of the first things I want to do every time I create a README.md in GitHub, or any time I write a longish Markdown document is create a Table of Contents for it. It is a pain to navigate the document looking for all headers and then a) copy text, b) remove spaces and c) lowercase all letters. I started to think: how can I automate this?

After a Google Search, I discovered:

Bash to the rescue

My feeling was that all this was probably overkill for generating a Table of Contents, so I tried to see if I could produce a bash one liner to generate the TOC. Note that this works in Mac OS X command line (which has different grep and sed than Linux).

I came up with the following (admittedly long) one liner:

grep -E "^#{1,5} " | sed -E 's/(#+) (.+)/\1:\2:\2/g' | awk -F ":" '{ gsub(/#/,"  ",$1); gsub(/[ ]/,"-",$3); print $1 "- [" $2 "](#" tolower($3) ")" }'

To test it I used the README.md for the mysql node.js library (see below).

Scriptify it

One of the downsides of the script is that it brings all headings, all the way down to the fifth nested heading. Not good. You can directly modify the first part to do grep -E "^#{1,2} " for example if you only want to go to the second heading. Or we can create a bash script that receives as first and second arguments the levels we want (with suitable defaults):

#!/bin/bash

# It will read the input and:
#
# 1. Extract only the headers via grep (using $1 and $2 as first and last heading level)
# 2. Extract the header text via sed and created ':' separated records of the form '###:Full Text:Full Text'
# 3. Compose each TOC line via awk by replacing '#' with '  ' and stripping spaces and caps of reference

grep -E "^#{${1:-1},${2:-2}} " | \
sed -E 's/(#+) (.+)/\1:\2:\2/g' | \
awk -F ":" '{ gsub(/#/,"  ",$1); gsub(/[ ]/,"-",$3); print $1 "- [" $2 "](#" tolower($3) ")" }'

Tests

Against this document

This is how this document looks in GitHub’s repository view:

Github Repository

The results of running the script agains this very document are:

- [Markdown and TOCs](#markdown-and-tocs)
- [Bash to the rescue](#bash-to-the-rescue)
- [Scriptify it](#scriptify-it)
- [Tests](#tests)

Against longer document

Here I use README.md for the mysql. It’s a long document with lots of headers. I run:

curl -s https://raw.githubusercontent.com/felixge/node-mysql/master/Readme.md | toc 2 2

I am happy to report that except for the self-referential first entry the results are identical to the actual TOC on the document:

- [Table of Contents](#table-of-contents)
- [Install](#install)
- [Introduction](#introduction)
- [Contributors](#contributors)
- [Sponsors](#sponsors)
- [Community](#community)
- [Establishing connections](#establishing-connections)
- [Connection options](#connection-options)
- [Terminating connections](#terminating-connections)
- [Pooling connections](#pooling-connections)
- [Pool options](#pool-options)
- [Pool events](#pool-events)
- [Closing all the connections in a pool](#closing-all-the-connections-in-a-pool)
- [PoolCluster](#poolcluster)
- [PoolCluster Option](#poolcluster-option)
- [Switching users and altering connection state](#switching-users-and-altering-connection-state)
- [Server disconnects](#server-disconnects)
- [Performing queries](#performing-queries)
- [Escaping query values](#escaping-query-values)
- [Escaping query identifiers](#escaping-query-identifiers)
- [Getting the id of an inserted row](#getting-the-id-of-an-inserted-row)
- [Getting the number of affected rows](#getting-the-number-of-affected-rows)
- [Getting the number of changed rows](#getting-the-number-of-changed-rows)
- [Getting the connection ID](#getting-the-connection-id)
- [Executing queries in parallel](#executing-queries-in-parallel)
- [Streaming query rows](#streaming-query-rows)
- [Multiple statement queries](#multiple-statement-queries)
- [Stored procedures](#stored-procedures)
- [Joins with overlapping column names](#joins-with-overlapping-column-names)
- [Transactions](#transactions)
- [Ping](#ping)
- [Timeouts](#timeouts)
- [Error handling](#error-handling)
- [Exception Safety](#exception-safety)
- [Type casting](#type-casting)
- [Connection Flags](#connection-flags)
- [Debugging and reporting problems](#debugging-and-reporting-problems)
- [Running tests](#running-tests)
- [Todo](#todo)