Nested Set is a model or standard for storing multi-level hierarchies in a database. In this article we’ll be using it to store categories for a shop, but the logic is the same for any type of multi-level resource.

Storing categories in a database can sometimes be pretty complicated to query if you have a hierarchy larger than 2 levels. Let’s consider the following example that I’ve borrowed from Nested Set’s Wikipedia page:

We have a clothing website that has the following categories:

File:NestedSetModel.svg

For the moment, don’t focus on the numbers. I will explain their purpose later in the article.

As you can see, we have a hierarchy on 4 levels. In this pretty typical scenario, the issues are:

  1. How to actually store them while keeping the hierarchy intact.
  2. And how to query for all the subcategories of a parent category? For example, querying all the products under “Women’s”.

Storing Multi-level Resources

It’s important to note that this needs to work on more than 4 levels. I only mention this because the first instinct we get is to actually create 3 diferent different columns in the categories table, to link them together. Maybe something like parent1_id, parent2_id and parent3_id.

That will not work for multiple reasons:

  • Storing and syncing all the IDs will very quickly become tedious.
  • Querying will be a nightmare.
  • And, of course, we’re stuck with 4 levels, if we want more or less, we need to create an additional script to fix the data to support more or less levels.

Therefore, we need a simpler solution that will link them together.

The answer is a single column: parent_id.

The table will contain the following columns:

id, parent_id, name

That’s all we need to store an infinite-level resource hierarchy. Well, we don’t need the name, but it will help with visualising the data.

Following the image above, we will create the Clothing category with parent_id = NULL, and the other categories will store their relative parent’s ID.

If we specify some IDs for the example categories in the image, our table would look something like this:

 | id | parent_id | name          | 
 | 1  |           | Clothing      | 
 | 2  | 1         | Men's         | 
 | 3  | 1         | Women's       | 
 | 4  | 2         | Suits         | 
 | 5  | 4         | Slacks        | 
 | 6  | 4         | Jackets       | 
 | 7  | 3         | Dresses       | 
 | 8  | 7         | Evening Gowns | 
 | 9  | 7         | Sun Dresses   | 
 | 10 | 3         | Skirts        | 
 | 11 | 3         | Blouses       | 

This is pretty easy to store even if we have thousands of categories, because we only need to update the parent_id column.

Please note that you don’t have to store them in any specific order. Meaning that you can even create the Clothing category at the very end and it will still work if the parent_id column is populated correctly.

Here’s an equally valid example:

 | id | parent_id | name          | 
 | 2  | 12        | Men's         | 
 | 3  | 12        | Women's       | 
 | 4  | 2         | Suits         | 
 | 5  | 4         | Slacks        | 
 | 6  | 4         | Jackets       | 
 | 7  | 3         | Dresses       | 
 | 8  | 7         | Evening Gowns | 
 | 9  | 7         | Sun Dresses   | 
 | 10 | 3         | Skirts        | 
 | 11 | 3         | Blouses       | 
 | 12 |           | Clothing      | 

So far, so good. We didn’t have to use anything from the Nested Set model yet, but we need this information in order to understand the next examples.

Querying Multi-level Categories

Now, you can already see that navigating from one category to another will be pretty easy in both directions, top-down and bottom-up.

If we are on the Clothing category, to query the first level down categories, all we have to do is to query for categories having parent_id = 1. Assuming that 1 is the ID of the Clothing category. To go another level down, we just change the parent_id to the selected category’s ID, in our case, either Men’s (2) or Women’s (3).

That case is pretty straight forward, but now let’s say that instead of displaying the next level of categories, we wanted to display the next 2 levels.

The only solution right now would be to run 2 queries. The first one selects the next level based on the ID of the active category. And then, the next query selects the children of the first result set.

So if we were on the Women’s category, the first query would return:

 | id | parent_id | name    | 
 | 7  | 3         | Dresses | 
 | 10 | 3         | Skirts  | 
 | 11 | 3         | Blouses |

Then we’d collect the IDs from the id column and use them to select the next level of categories. And that would get us the following result set:

 | id | parent_id | name          | 
 | 8  | 7         | Evening Gowns | 
 | 9  | 7         | Sun Dresses   | 

Now, this might not seem like such a bad idea for 2 levels, but what if we wanted to do the same for 5 levels? Five queries to get some categories is still not a big deal, but we’re not just querying for categories in our apps, we do all kinds of other queries, and it can quickly add up tot a lot of unnecessary code and queries.

But what if I told you that we’re able to get all of the categories in a single query, regardless of the number of levels.

Enter: Nested Set

Nested Set allows us to visualise the entire hierarchy linearly and that, in turn, allows us to select all of the sub-categories of any parent in the hierarchy by simply specifying a range.

Let’s take a look at the first image again:

File:NestedSetModel.svg

Now, notice the numeric values on the horizontal axis.

We can notice three things:

  1. That they create boundaries between the categories.
  2. That the numeric boundaries are all unique.
  3. That we can actually specify a range for all the parent categories to select all of the containing sub-categories.

All we have to do to select the sub-categories of the “Women’s” category would be to select all of the categories that have boundaries between the “Women’s” category boundaries, which are 10 and 21.

Nested Set, to keep things simple, calls these boundaries left and right. So, we’re just going to store the boundaries in the Categories table as such:

 | id | parent_id | name          | left | right | 
 | 1  |           | Clothing      | 1    | 22    | 
 | 2  | 1         | Men's         | 2    | 9     | 
 | 3  | 1         | Women's       | 10   | 21    | 
 | 4  | 2         | Suits         | 3    | 8     | 
 | 5  | 4         | Slacks        | 4    | 5     | 
 | 6  | 4         | Jackets       | 6    | 7     | 
 | 7  | 3         | Dresses       | 11   | 16    | 
 | 8  | 7         | Evening Gowns | 12   | 13    | 
 | 9  | 7         | Sun Dresses   | 14   | 15    | 
 | 10 | 3         | Skirts        | 17   | 18    | 
 | 11 | 3         | Blouses       | 19   | 20    | 

Again, to select all the sub-categories of the “Women’s” category, our query would look something like this:

select * from categories where left > 10 and right < 21

In our case, that selects 2 levels of sub-categories, but we could just as well select all of the Clothing’s category sub-categories using:

select * from categories where left > 1 and right < 22

And if we want the parent to be included in our query, we use:

select * from categories where left >= 1 and right <= 22

Pretty simple stuff. But the hard part still remains. How do we actually generate those boundaries?

Generating the Boundaries

This can be done with a pretty simple algorithm.

For the sake of this article, I’ve created an array containing all of our Categories table that we’ve used throughout the article. It looks like this:

$cats = [
    // ['id', 'parent_id', 'name'],
    [1, null, "Clothing"],
    [2, 1, "Men's"],
    [3, 1, "Women's"],
    [4, 2, "Suits"],
    [5, 4, "Slacks"],
    [6, 4, "Jackets"],
    [7, 3, "Dresses"],
    [8, 7, "Evening Gowns"],
    [9, 7, "Sun Dresses"],
    [10, 3, "Skirts"],
    [11, 3, "Blouses"],
]

This is equivalent to what you might get from a database, that’s the reason that we’re starting from a simple array and not a multi-dimensional array.

To make it multi dimensional and not waste a lot of computing power, we will need to transform the array into a hash table and a list of children first. I will explain why in a moment, but for the time being, the code is:

$catsById = $children = [];
$cats = [
    [1, null, "Clothing"],
    [2, 1, "Men's"],
    [3, 1, "Women's"],
    [4, 2, "Suits"],
    [5, 4, "Slacks"],
    [6, 4, "Jackets"],
    [7, 3, "Dresses"],
    [8, 7, "Evening Gowns"],
    [9, 7, "Sun Dresses"],
    [10, 3, "Skirts"],
    [11, 3, "Blouses"],
];

foreach ($cats as $i => $cat) {
    // $cat[0] is the ID selector.
    $catsById[$cat[0]] = $cat;

    // $cat[1] is the PARENT ID selector.
    $children[$cat[1]][] = $cat;

    // Clear the value from the initial array to free up memory.
    unset($cats[$i]);
}

Our resulted hash table ($catsById) array looks like this:

[
   1 => [1, null, Clothing],
   2 => [2, 1, Men's],
   3 => [3, 1, Women's],
   4 => [4, 2, Suits],
   5 => [5, 4, Slacks],
   6 => [6, 4, Jackets],
   7 => [7, 3, Dresses],
   8 => [8, 7, Evening Gowns],
   9 => [9, 7, Sun Dresses],
   10 => [10, 3, Skirts],
   11 => [11, 3, Blouses],
]

And our $children hash table looks like this:

[
     null => [
         [1, null, Clothing]
     ]
     1 => [
         [2, 1, Men's]
         [3, 1, Women's]
     ]
     2 => [
         [4, 2, Suits]
     ]
     4 => [
         [5, 4, Slacks]
         [6, 4, Jackets]
     ]
     3 => [
         [7, 3, Dresses]
         [10, 3, Skirts]
         [11, 3, Blouses]
     ]
     7 => [
         [8, 7, Evening Gowns]
         [9, 7, Sun Dresses]
     ]
 ]

We’ve done this because in the next section we need to reference the categories by their IDs and a category’s children by the parent_id, and we wouldn’t want to waste resources by using loops to search the main categories array, because that would just mean that we’d be looping over our array a lot of times.

It’s not really a big issue with 11 categories like we have now, but if you have 10 000, you might see a spike on the CPU every time you need to calculate the left and right values and update the database 10 000 times.

For example, when we’d want to get the Category with ID 9, we’d have to loop over the entire array and search for it, then repeat this for all of the other categories. And do the same to get the children of a specific parent ID.

But by doing the hash tables first, we only loop over it once, and now if we want to reference any specific Category, we’d simply access it through $catsById[ID] and if we need a category’s children, we can access them through $children[parentID].

Now that we have the hash tables in place, let’s proceed to create the a multi-dimensional array, a.k.a. the entire hierarchy.

$createHierarchy = function ($parentId = null) use ($catsById, $children, &$createHierarchy) {
    $result = [];

    if (isset($children[$parentId])) {
        foreach ($children[$parentId] as $cat) {
            $cat['children'] = $createHierarchy($cat[0]);
            $result[] = $cat;
        }
    }

    return $result;
};

As we can see, we’re using the hash tables and we’re also making use of recursion, because we want to support any number of levels.

The way this works might not be as clear at first, because our brains have a tendency to crap out any time we try to think through a recursive algorithm. Basically, by calling the function with an initial parent ID, it will start from that parent ID to go down the levels until it reaches inexistent parent IDs.

If the parent ID doesn’t exist, i.e. we don’t have an entry in our $children hash table, we’re going to return the current list, which is an empty array.

If the parent ID is inside our hash table, we’re going to loop through the children, call the function again on all the categories’ IDs and then add the category to the $results array.

I suggest taking a few minutes to actually understand how recursion works in this case, because it’s really useful to have this as a skill.

Cool, so let’s see what our function produces if we call it with NULL as a parent ID:

[1, null, Clothing ->
     [2, 1, Men's ->
         [4, 2, Suits ->
             [4, 2, Slacks]
             [4, 2, Jackets]
         ]
     ]
     [3, 1, Women's ->
         [7, 3, Dresses ->
             [7, 3, Evening Gowns]
             [7, 3, Sun Dresses]
         ]
         [10, 3, Skirts]
         [11, 3, Blouses]
     ]
 ]

I’ve tried to simplify the result a bit so you can see more easily the generated structure. Hopefully you might notice that the generated array resembles very much the following schema:

This is a tree representation of the Wikipedia image from above, and it’s also the same thing as our generated array.

Let’s generate the left and right values. To do that, we use the following algorithm:

$computeBoundaries = function (&$cats, &$i = 1) use (&$computeBoundaries) {
    foreach ($cats as &$cat) {
        $cat['left'] = $i++;

        $computeBoundaries($cat['children'], $i);

        $cat['right'] = $i++;
    }

    return $cats;
};

$computeBoundaries($cats);

This one might take a bit more to understand, but it’s worth looking at.

And the result is:

// [id, parent_id, name, left, right]

[1, null, Clothing, 1, 22]
     [2, 1, Men's, 2, 9]
         [4, 2, Suits, 3, 8]
             [5, 4, Slacks, 4, 5]
             [6, 4, Jackets, 6, 7]
     [3, 1, Women's, 10, 21]
         [7, 3, Dresses, 11, 16]
             [8, 7, Evening Gowns, 12, 13]
             [9, 7, Sun Dresses, 14, 15]
         [10, 3, Skirts, 17, 18]
         [11, 3, Blouses, 19, 20]

Now, if this was a real scenario, we would also update the database in this last function. Like this:

$computeBoundaries = function (&$cats, &$i = 1) use (&$computeBoundaries) {
    foreach ($cats as &$cat) {
        $cat['left'] = $i++;

        $computeBoundaries($cat['children'], $i);

        $cat['right'] = $i++;
        
        // update categories set left = $cat['left'], right = $cat['right'] where id = $cat['id']
    }

    return $cats;
};

$computeBoundaries($cats);

Updating the left and right values needs to happen every time the category order changes, because in a real world example, the Categories table would also have an order column that would be used to sort the categories.

Also, they should be updated when categories get added or deleted.

Combining the Functions

To better understand what actually happens, I’ve done this step by step until now. But we can combine the last two functions into one. Doing that, we reduce the amount of loops that we need to do even more.

This is an example of a complete and optimised solution:

$cats = [...the initial categories array...];
$catsById = $children = [];

foreach ($cats as $i => $cat) {
    // $cat[0] is the ID selector.
    $catsById[$cat[0]] = $cat;

    // $cat[1] is the PARENT ID selector.
    $children[$cat[1]][] = $cat;

    unset($cats[$i]);
}

$computeBoundaries = function ($parentId = null, &$i = 1) use ($catsById, $children, &$computeBoundaries) {
    if (isset($children[$parentId])) {
        foreach ($children[$parentId] as $cat) {
            $cat['left'] = $i++;
            $cat['children'] = $computeBoundaries($cat[0], $i);
            $cat['right'] = $i++;

            // update categories set left = $cat['left'], right = $cat['right'] where id = $cat['id']
        }
    }
};

$computeBoundaries();

I hope this will help you understand how Nested Set works. If you have any questions, please comment below and I’ll try to help you as best as I can.