You’ve probably noticed by now that the site’s theme includes a table of contents for each post. You might be wondering if I actually spend time creating that table of contents myself. Well, sorry to disappoint, but the answer is: No. It is in fact generated automatically based on the headings inside the article. I think it’s pretty slick, so I decided to share with you how it’s done.

Introduction

This is, in fact, a very popular approach to generating a table of contents. So common, that it is what MS Word uses to generate it too. There are actually 2 ways of generating a table of contents in WordPress:

  1. Through JavaScript, in the front end.
  2. Through PHP, when displaying the content.

Now, there’s no visual difference between the two approaches. But, from an SEO perspective, I chose the latter, because the TOC is also embedded into the page content itself.

The method itself consists of scraping all the heading elements and creating a list of them, where every item points to the heading itself inside the article.

So, let’s see how it’s done in code.

Generating the Table of Contents HTML

The first step is to generate the HTML. This means scraping the HTML content of the post to grab each heading element and append it to the newly created TOC.

To do this, we’re going to add a new filter for the_content, from our theme’s functions.php file.

First, let’s see how the entire code looks like, and then we’ll analyze it bit by bit.

// Inject the TOC on each post.
add_filter('the_content', function ($content) {
    global $tableOfContents;

    $tableOfContents = "
        <div class='h5'>
            Table of Contents <span class='toggle'>+ show</span>
        </div>
        <div class='items'>
            <div class='item-h2'>
                <a href='#preface'>Preface</a>
            </div>
    ";
    $index = 1;

    // Insert the IDs and create the TOC.
    $content = preg_replace_callback('#<(h[1-6])(.*?)>(.*?)</\1>#si', function ($matches) use (&$index, &$tableOfContents) {
        $tag = $matches[1];
        $title = strip_tags($matches[3]);
        $hasId = preg_match('/id=(["\'])(.*?)\1[\s>]/si', $matches[2], $matchedIds);
        $id = $hasId ? $matchedIds[2] : $index++ . '-' . sanitize_title($title);

        $tableOfContents .= "<div class='item-$tag'><a href='#$id'>$title</a></div>";

        if ($hasId) {
            return $matches[0];
        }

        return sprintf('<%s%s id="%s">%s</%s>', $tag, $matches[2], $id, $matches[3], $tag);
    }, $content);

    $tableOfContents .= '</div>';

    return $content;
});

The first and last lines register a new WordPress filter:

add_filter('the_content', function ($content) {
    // ...
});

The $content parameter is a parameter that’s going to be filled in by WordPress when it calls our new filter. It contains the HTML source code of our article.

The next line,

global $tableOfContents;

Defines the $tableOfContents variable as global, so that we can reference it from outside the filter. This is because we don’t want to actually embed the TOC into the content itself, we want to display it in a separate column.

Next, the $tableOfContents variable is initialized with a default item, because we don’t usually start articles with headings, so there would be no heading for the first part of the article. Therefore, we create one.

$tableOfContents = "
    <div class='h5'>
        Table of Contents <span class='toggle'>+ show</span>
    </div>
    <div class='items'>
        <div class='item-h2'>
            <a href='#preface'>Preface</a>
        </div>
";

The $index is set to 1 and we’re going to use this to make sure that all of our headings have unique identifiers.

Next, we’re going to do a couple different things at the same time with the following instruction.

// Insert the IDs and create the TOC.
$content = preg_replace_callback('#<(h[1-6])(.*?)>(.*?)</\1>#si', function ($matches) use (&$index, &$tableOfContents) {
    $tag = $matches[1];
    $title = strip_tags($matches[3]);
    $hasId = preg_match('/id=(["\'])(.*?)\1[\s>]/si', $matches[2], $matchedIds);
    $id = $hasId ? $matchedIds[2] : $index++ . '-' . sanitize_title($title);

    $tableOfContents .= "<div class='item-$tag'><a href='#$id'>$title</a></div>";

    if ($hasId) {
        return $matches[0];
    }

    return sprintf('<%s%s id="%s">%s</%s>', $tag, $matches[2], $id, $matches[3], $tag);
}, $content);

The preg_replace_callback() function is going to replace something based on a regex pattern, and for the replacer, is going to allow us to use a callback function. We need the callback function for 2 things:

  1. To grab each heading and add it to the $tableOfContents variable.
  2. To give each heading in the article a unique ID, in order to be able later to link the TOC links to their respective headings.

To understand the regex pattern, there is a tutorial about Regex on our site, I recommend checking that out since it’s a very useful skill to learn.

'#<(h[1-6])(.*?)>(.*?)</\1>#si'

In short, we’re looking for HTML tags that start with the letter h and end in 1, 2, 3, 4, 5, or 6. This should match all of the headings. At the same time, we’re grouping the tag name ((h[1-6])), its attributes ((.*?)>), and the tag’s inner HTML (>(.*?)</\1>) into separate groups, to be able to use them individually in the callback function. We’re also using the s flag, to make the . selector match newlines, and the i to ignore the case of the matched elements.

We are returning the preg_replace_callback() result back into the $content variable because we are adding IDs to the headings.

The callback function references $tableOfContents and $index so that we can have access to them from inside. We’re also using the & operator to get their pointer reference so that we’ll be able to change their values.

The $matches parameter is filled in by preg_replace_callback() with the groupings matched based on the regex pattern.

Next, we’re going to use the first grouping as the $tag.

$tag = $matches[1];

The next line uses the 3rd grouping as the title of the heading, but it strips the tags first, because there are cases where the heading contains other HTML elements, like <a> or <strong>. We don’t want those in the TOC, because they will mess either with our own links to the headings themselves, or with the style.

$title = strip_tags($matches[3]);

The next line:

$hasId = preg_match('/id=(["\'])(.*?)\1[\s>]/si', $matches[2], $matchedIds);

Checks if the currently matched element already has an id attribute set. If it does, it’s going to be stored in the $matchedIds variable.

Next, we create the unique ID that’s going to be used as a href attribute on the TOC, and as an ID for the actual heading.

$id = $hasId ? $matchedIds[2] : $index++ . '-' . sanitize_title($title);

First, we check if it already had an ID. If it did, we’re going to use that one. However, if it did not have one, we’re going to create one based on the $index and the ‘slugified’ version of the title. A slug is a URL-friendly string, we’re just generating one from the heading content, using a built-in WordPress function called sanitize_title().

Next, we’re appending to the TOC the new element.

$tableOfContents .= "<div class='item-$tag'><a href='#$id'>$title</a></div>";

We’re using that class to style it later. We’re also going to point it with an <a> element at the heading itself from inside the article.

Next:

if ($hasId) {
    return $matches[0];
}

return sprintf('<%s%s id="%s">%s</%s>', $tag, $matches[2], $id, $matches[3], $tag);

We’re checking if it had an ID, and if it did, we’re returning the matched string, untouched.

If it did not have an ID, we’re going to return a modified version of the matched heading HTML, which includes the ID. This is essential in order to be able to link from the TOC to the heading itself through basic browser functionalities.

The next line simply appends a closing <div> tag to the TOC.

$tableOfContents .= '</div>';

And then we’re going to return the modified $content which now includes ID attributes for all the headings.

Using the Table of Contents

I’ve created a simple function to return the global $tableOfContents variable. It’s not really necessary, but it keeps it consistent with the other WordPress components, and you don’t have to check for existance in your template.

function get_the_table_of_contents()
{
    global $tableOfContents;

    return $tableOfContents;
}

To use it, we simply use:

<?= get_the_table_of_contents() ?>

Inside the loop.

This is how it looks inside the template:

<div id="preface" class="post-content row">
    <div class="content col">
        <?php the_content() ?>
    </div>

    <?php if (is_single()): ?>
        <div class="post-toc col-auto">
            <div class="wrapper">
                <?= get_the_table_of_contents() ?>
            </div>
            <div class="placeholder"></div>
        </div>
    <?php endif ?>
</div>

Notice the id="preface" given to the content section, which our first link in TOC points to.

I’m using that in the singular.php template file, which applies to both pages and posts, so the is_single() check is added to make sure we only display this on single posts.

Styling the TOC

Now, this will differ from site to site, but if you choose to use this technique on yours, the only required step is to add some different padding on the left of each TOC item, based on it’s item-hX class.

Here’s an example:

.item-h3 {
  padding-left: 15px;
}

.item-h4 {
  padding-left: 30px;
}

.item-h5 {
  padding-left: 45px;
}

.item-h6 {
  padding-left: 60px;
}

I’m only styling from item-h3 onwards, because I never use h1 inside articles, so there won’t be any item-h1, and since item-h2 is the first layer in the TOC, it doesn’t need any padding.

This styling gives it a more tree-like look, which makes it more easy to read.