Honest answer: If the blocks aren't\* contiguous, there's an index for that! (\*...

Honest answer: If the blocks aren't* contiguous, there's an index for that!

(* It depends on the database type. In an LSM-tree layer, the blocks are typically contiguous in a file, so direct binary search is possible, though it might not be the most efficient method. The filesystem handles locating blocks in this case.)

The database table is disk blocks containing data where the blocks are logically in order of keys, as if in a contiguous array. Really they are laid out differently on disk.

So there's a block index, which is just a smaller version of the same table data structure: a table mapping keys to values, implemented as blocks in logical order of keys, as if in a contiguous array. Except in this index, the map is from key-ranges of the first table to block locations on disk. This index lets you go from keys to block locations on disk, and it also lets you do a course-grained version of the binary search to narrow down to a single block of the larger table.

Ok, but then how do you look things up in the block index if it's using the same kind of data structure, made of multiple blocks? Same again: Each block index has its own smaller block index.

This neat recursive definition gives you a tower of progressively smaller block indexes until the size is just one block, which doesn't need an index.

Guess what you get when each table of logically sorted, contiguous blocks has a smaller index to say where the blocks are really located?

The data structure is called a B-tree (technically a B+tree), and the smallest index is the root block.

The tree structure arises from the recursive description, where each table has another table (until it stops).

This is a decidedly non-standard way of describing the B-tree data structure. There's no explicit tree. But it's a valid and useful alternative view. (Note, these block indexes are not what is generally meant by database indexes, and they are not visible at the SQL level. They are an implementation detail.)

One of the useful things to emerge from this view is that each level is just logically a flat table, made of blocks that are logically in key order but have arbitrary physical location. It's not necessary for every index to use the same data structure, or be on the same storage. Depending on how you think about algorithms, this description might be simpler to work with.