Data Structures: The Tree

9 min readSep 12, 2021

I’ve talked at some length about some other common data structures in previous articles: hash tables, linked lists, stacks etc. And this week I wanted to reopen the series and plant some seeds towards understanding and implementing data trees.

What is a Tree?

At its root (no pun intended), data trees are seen everywhere in our daily lives. They represent a hierarchically organized set of data, such as the reporting structure of a company, or the file system of a computer.

This conceptual representation is a valid tree in the coding realm, as it meets a number of criteria:

To be considered a tree:

One node is distinguished as the Root node.
Each node is connected FROM exactly one other node (excluding the root). This means two parent nodes may never point at the same child node.
Parent nodes may point to multiple child nodes. If a node points to no child nodes, it’s considered a leaf node or external node, and represents the end of a particular branch of the tree. All other nodes are considered internal nodes and point to at least one child node.
No cycles may be created by the directions of the nodes. Each branch must eventually terminate with a leaf node.

There are a few other general concepts to digest on trees, before we move on to their implementation.

Nodes with the same parents are considered siblings. In the above example, the “employees” are all siblings to each other because they share the “manager” as a parent node.
The depth of a node is the number of edges (or connections) from that node to the root, and the height of a node is the number of edges from that node to the deepest leaf node.
the height of a tree is the number of edges from the root to the deepest leaf node.

Let’s look at some more examples:

Let’s see what we know so far.

Does this qualify as a valid tree? Yes. It meets all the criteria to be considered a tree.
What is the root node? Node 1. Why? It’s at the top of the tree, and is the only node without a parent node pointing to it.
What is the depth of node 6? 2. There are 2 edges between node 6 and the root node, so it has a depth of 2.
How many leaf nodes does this tree have? 5. Nodes 2, 9, 6, 7, and 11 have no children, so they are leaf or external nodes.
If this tree has 11 nodes, then how many edges does it have (without counting)? 10. Another feature to remember about trees is that no matter what, the number of edges will always be the number of nodes minus 1. This is because every node must have exactly one parent node, except for the root node.
What is the total height of the tree (or the root node)? 4. There are 4 edges from the deepest node (11) to the root node, giving us a height of 4.

Here are some examples of structures that do NOT count as trees:

This would not be a valid tree, because Node 6 is being pointed at by 2 parent nodes. This violates the conditions to qualify as a tree.

Here, we may think “Well every node is only being pointed at by a single parent…” And yes, this is true, however, this actually fails 2 of our criteria. First, there is no designated root node. Though Node 1 looks like it’s at the top of the tree, it is being pointed at by Node 6, so we don’t have a root node. This pattern also created a loop between Nodes 1, 3, 5, and 6, which means the branch will not terminate at a leaf node and therefore, this does not qualify as a tree.

Now this doesn’t mean that a data structure like this can’t be useful, or is somehow “bad”, it simply means that for the purposes of implementing a Tree data structure, this would not qualify. In fact, data structures like this are very common, but I’ll save that discussion for another article.

Let’s Code It Out

Now that we understand what qualifies as a tree, let’s look at how we actually implement this structure in our code to solve real problems.

Let’s first look at how to actually create a tree. We’ll use our first example from earlier, with the company structure with the CEO at the top, 2 managers who report to the CEO, and 3 employees under each of the managers.

For simplicity, I’ll code this out in JavaScript.

I’ve seen many practice sites represent the objects in a tree in different ways, for example, if this were a coding challenge, you may see example input data look something like:

Tree = {
  "head": 1,
  "nodes": [
    {id: 1, value: "CEO", child1: 2, child2: 3, child3: null}
    {id: 2, value: "Manager", child1: 4, child2: 5, child3: 6}
    {id: 3, value: "Manager", child1: 7, child2: 8, child3: 9}
    {id: 4, value: "Employee", child1: null, child2: null, child3: null}
    {id: 5, value: "Employee", child1: null, child2: null, child3: null}
    {id: 6, value: "Employee", child1: null, child2: null, child3: null}
    {id: 7, value: "Employee", child1: null, child2: null, child3: null}
    {id: 8, value: "Employee", child1: null, child2: null, child3: null}
    {id: 9, value: "Employee", child1: null, child2: null, child3: null}
  ]
}

This does not mean that when creating a tree, you need to store your node objects in an external structure like this. This is simply a representation of the data that is actually sourced by traversing the actual tree node objects. The underlying structure is still a collection of basic objects in memory, pointing to each other.

Let’s start with our tree class. A tree node will typically have a “value” property, and a number of “child” properties that point to other nodes. In this case, we’ll assume a maximum number of 3 child nodes allowed per parent, this is only to simplify the tree.

If I wanted to start building this tree by hand, I’d do so one object at a time:

This is a brute force, line by line creation of a tree. What do you think the last line of this script will return? Let’s try it:

Tree {
  value: 'CEO',
  child1: Tree {value: 'Manager',
    child1: Tree {
      value: 'Employee',
      child1: null,
      child2: null,
      child3: null
    },
    child2: Tree {
      value: 'Employee',
      child1: null,
      child2: null,
      child3: null
    },
    child3: Tree {
    value: 'Employee',
    child1: null,
    child2: null,
    child3: null
    }
  },
  child2: Tree {
    value: 'Manager',
    child1: Tree {
      value: 'Employee',
      child1: null,
      child2: null,
      child3: null
    },
    child2: Tree {
      value: 'Employee',
      child1: null,
      child2: null,
      child3: null
    },
    child3: Tree {
      value: 'Employee',
      child1: null,
      child2: null,
      child3: null
    }
  },
  child3: null
}

This looks much like we expected. We don’t use the “childx” properties to list IDs or some other relational data, we point to the literal child node object in memory. This way, we can derive every bit of information from a tree by traversing down the children of each node, until we reach leaf nodes that have no children.

Now we have a valid tree, lets do exactly that, derive data from it. I’d like to know how many total employees are in the company, just from this tree that we’ve already created.

Sum Numbe

So you’ll notice that in my sumAllNodes function, I’m not utilizing the value property of the objects. The value in this tree is just the title of the employee, so I don’t really need it if I’m just counting the total employees (nodes).

If you are familiar with recursion, this probably looks familiar, but if you are new to the concept, this may by slightly confusing. What’s actually going on here.

Recursive functions call themselves over and over with progressive data (or on a node at a time, for example), and rely on a “base case” to tell them they have reached the end and to terminate. In our function here, the base case is this line:

if (node === null) return 0;

If the node passed in to the function is null, it’s just going to return 0, and will not execute the rest of the function. This happens when we are at a leaf node, ending a branch.

Let’s look at what this code actually does, step by step. We first call sumAllNodes(treeNode1); This was our root node (CEO).

sumAllNodes(treeNode1);
node is not null. Return 1 + sumAllNodes(node.child1) + sumAllNodes(node.child2) + sumAllNodes(node.child3);

3. CEO return cannot happen until we know what sumAllNodes(treeNode2) and sumAllNodes(treeNode3) (the manager nodes) will return, so we call the function again with those nodes passed in.

4. Each manager node returns the same. Return 1 + sumAllNodes(node.child1) + sumAllNodes(node.child2) + sumAllNodes(node.child3);

5. Manager nodes can’t return until we know what the sumAllNodes function will return when the employee nodes are passed in.

6. sumAllNodes(treeNode4); Return 1 + sumAllNodes(node.child1) + sumAllNodes(node.child2) + sumAllNodes(node.child3);

7. Finally, we hit our base case, since treeNode4.child1, treeNode4.child2, and treeNode4.child3 are all null, they return 0 (as per our base case)

8. The employees finally can return 1 + 0 + 0 + 0, or 1, up to the function with the manager node passed in.

9. This tells the manager calls to return 1 + 1 + 1 + 1 or 4, and passes this value to the initial function with the CEO node passed in.

10. Now that we know the return calls for the managers, the CEO can finally return 1 + 4 + 4 + 0 (1 plus the sum of the sums of his 2 child nodes).

11. Return 9. And we see the answer is correct!

This seems complicated, for such a simple problem, but when we begin to scale things up, recursion allows us to tackle very large problems by breaking them into smaller parts, and finding the formulas to solve them dynamically.

Trees are designed to compliment dynamic programming. They are meant to be traversed, added to, and deleted from programmatically and typically utilizing recursive functions.

Other Flavors of Trees:

With additional constraints, we can create custom trees for unique solutions to our data queries. Here are a few of the most common:

Binary Tree

Binary trees have a maximum of 2 children per parent node, and are typically used in searching and sorting in a large file system.

A binary tree class would look something like this.

Binary Search Tree (BST)

Let’s add another constraint:

Every node to the left of any given node must be strictly less than the given node, and every node to the right of a given node must be strictly greater than to the given node. Whether duplicate values are allowed and whether they should go to the right or left of their twin node is up to the developer.

Now we have ourselves a Binary Search Tree.

The ordered nature of BSTs make them a perfect choice for:

Multi-Level Indexing
Implementing various search algorithms
Maintaining a sorted stream of data

Red Black Trees

Red Black Trees are a special breed of search tree that keep its own subtrees balanced. Here are the criteria for a Red Black Tree:

Every node is either red or black
The root and leaf nodes are always black
If a node is red, then its children are black
All paths from a node to its leaf descendants contain the same number of black nodes

Let’s look at an example.

In this tree, we would have to consider the color scheme as we add or delete nodes, so a rotation scheme would need to be implemented. Because of the sorted nature of this tree all three operations (search, insert, delete) would each run in O(log^n) time and O(n) space, with one bit needed to store the red/black status of the node.

Conclusion

Trees are an incredibly common and helpful data structure, especially when sorting hierarchical systems and data streams. Thank you for sticking through this far and happy coding!