Red-black trees

The story of self-balancing binary trees is one of the more interesting histories of computer science.

Binary search trees are relatively simple, but they can get so unbalanced that search time becomes linear instead of logarithmic with the number of nodes.

All self-balancing binary trees are trying to achieve the same thing: when inserting or deleting nodes, perform a rebalancing using tree rotations , making sure this rebalancing also only takes logarithmic time, and that the resulting balance guarantees logarithmic search time.

In 1962, two Soviet mathematicians named Andelson-Velskii and Landis published a paper describing the first self-balancing binary tree. It became known as the AVL tree after their last names. This tree requires a 3-state (2 bit) variable in each node to keep track of the shape of the tree.

Then, in 1978, Guibas and Sedgewick published a paper showing a general framework that could describe all the balancing implementations at the time (including AVL trees). It only requires 2 states (1 bit) per node. They describe several different versions of the tree with different performance tradeoffs (faster search vs. faster insert). They were working at Xerox PARC with the first printers, and they liked the colors black and red best for drawing pictures of their trees, so they are now called red-black trees .

red-black trees require only 1 bit instead of 2 per node for their state, and provide faster insertion by doing less balancing work and enforcing a less strict balancing requirement. AVL trees enforce a stricter mathematical bound on their balance, so their insertion takes longer and does more rotations, but their search is asymptotically faster.

Today, red-black trees have become the de facto implementation of self-balancing binary trees. The most widely used implementation is used to implement std::set and std::map by the GNU C++ STL. My particular Linux machine has those headers under:
/usr/include/c++/4.7.2/bits/stl_tree.h
Ironically, this implementation ends up using a 32-bit integer enum to store the 1-bit state.

Another widely used implementation is in the Linux kernel. I think this one does use one of the bits in one of the pointers to store the color. The Boost library also has an implementation. In the usual style of Boost, they have a template argument that lets you choose whether to compact the 1 bit.

Personally, my favorite self-balancing tree is a little-known paper published by Arne Andersson.

At that time, all other implementations of balancing trees were ugly and used many lines of code. Andersson's paper showed a full implementation in just one or two pages of text. They have come to be called AA trees, after his name. AA trees require a full integer per node, but GNU's STL uses the same memory.

I have implemented both AVL trees and AA trees, and I completely agree with Andersson's argument of simplicity.