Using bit strings for modeling stable hierarchies in tables, is the faster solution?

Purely relational databases (SQL) seem to suffer to nowadays from the lack of a good solution for search/indexing hierarchies. Here a systematic review of the subject: https://stackoverflow.com/q/4048151 What is surprising is that no "Lineage Column" solution makes use of a bit string column, in particular adopting this column as the primary key. Maybe this article, which called the strategy "Sort Path", suggests something along those lines... But I didn't see anything more explicit. Illustrating Fruits can form a stable (static) taxonomy for the owner of a grocery store. Below, each fruit in the grocery store's inventory received a bitString identifier. Apples were labeled with the prefix 0, oranges with the prefix 1. Within the set of apples, green apples receive the prefix 01. When using SQL bitstring the "ORDER BY" use the lexicographic order. For example ordering all bitstrings, from 1 to 4 bits: 00 apple "0" 000 apple "00" 0000 apple "000" 01 green apple "" 010 green apple "0" 011 green apple "1" 10 orange "0" 101 orange "01" As we can see, the lexicographical order is grouping taxons through intervals ("all apples" set, green apples subset, oranges set). BitString labels are not numbers, but can be counters using successor function, so is valid as Primary Keys. For example the next green apple will be 0011 and the next orange 110. In a finite tree, suppose 5 bits, we can define the function MaxCount5bits(x), to check the last child. For example for oranges MaxCount5bits('1') is 11111, for green apples MaxCount5bits('01') is 01111. That is, using standard bitString operators, MaxCount5bits(x) = substring(x || '11111',1,5). The fruit table can be something like CREATE TABLE fruit (id bitString PRIMARY KEY, info text), so we can do any tree-search operation: find all oranges: SELECT * FROM fruit WHERE id BETWEEN '1' AND MaxCount5bits('1') find all green apples: SELECT * FROM fruit WHERE id BETWEEN '01' AND MaxCount5bits('01') find all apples: SELECT * FROM fruit WHERE id BETWEEN '0' AND MaxCount5bits('0') find the parent of the apple 01: no search is necessary, it is the prefix 0, that is something as substring(x,1,length(x)-1). find the parent of the orange 101: 10 or 1 (because it is a two-bit-prefix taxonomy). ... reduced the taxonomic analysis to a simple "check BETWEEN at b-tree" operation, and "check ID prefix" operation. And, since it is indexed as a primary key, it is fast (the search engine uses a b-tree). Question In view of the illustrated, the question is: is it the faster solution? That is, best performance in any SQL-database that supports bitstrings and its lexicographical order b-tree indexation. Seems also the the most economical solution: all operations are ID-based, no extra column, and with all integrity (reference and primary key) guaranteed. PS: perhaps a more useful variant of the model is to adopt the first bits as taxonomic control and the remaining bits as a serial-integer counter. In the illustration, there are 2 taxonomic bits.

Mar 4, 2025 - 16:38
 0
Using bit strings for modeling stable hierarchies in tables, is the faster solution?

Purely relational databases (SQL) seem to suffer to nowadays from the lack of a good solution for search/indexing hierarchies. Here a systematic review of the subject: https://stackoverflow.com/q/4048151

What is surprising is that no "Lineage Column" solution makes use of a bit string column, in particular adopting this column as the primary key. Maybe this article, which called the strategy "Sort Path", suggests something along those lines... But I didn't see anything more explicit.


Illustrating

Fruits can form a stable (static) taxonomy for the owner of a grocery store. Below, each fruit in the grocery store's inventory received a bitString identifier. Apples were labeled with the prefix 0, oranges with the prefix 1. Within the set of apples, green apples receive the prefix 01.

enter image description here

When using SQL bitstring the "ORDER BY" use the lexicographic order. For example ordering all bitstrings, from 1 to 4 bits:

 00       apple "0"
 000      apple "00"
 0000     apple "000"
 01       green apple ""
 010      green apple "0"
 011      green apple "1"
 10       orange "0"
 101      orange "01"

As we can see, the lexicographical order is grouping taxons through intervals ("all apples" set, green apples subset, oranges set).

BitString labels are not numbers, but can be counters using successor function, so is valid as Primary Keys. For example the next green apple will be 0011 and the next orange 110.

In a finite tree, suppose 5 bits, we can define the function MaxCount5bits(x), to check the last child. For example for oranges MaxCount5bits('1') is 11111, for green apples MaxCount5bits('01') is 01111. That is, using standard bitString operators, MaxCount5bits(x) = substring(x || '11111',1,5).

The fruit table can be something like CREATE TABLE fruit (id bitString PRIMARY KEY, info text), so we can do any tree-search operation:

  • find all oranges: SELECT * FROM fruit WHERE id BETWEEN '1' AND MaxCount5bits('1')
  • find all green apples: SELECT * FROM fruit WHERE id BETWEEN '01' AND MaxCount5bits('01')
  • find all apples: SELECT * FROM fruit WHERE id BETWEEN '0' AND MaxCount5bits('0')
  • find the parent of the apple 01: no search is necessary, it is the prefix 0, that is something as substring(x,1,length(x)-1).
  • find the parent of the orange 101: 10 or 1 (because it is a two-bit-prefix taxonomy).
  • ... reduced the taxonomic analysis to a simple "check BETWEEN at b-tree" operation, and "check ID prefix" operation.

And, since it is indexed as a primary key, it is fast (the search engine uses a b-tree).

Question

In view of the illustrated, the question is: is it the faster solution?
That is, best performance in any SQL-database that supports bitstrings and its lexicographical order b-tree indexation.

Seems also the the most economical solution: all operations are ID-based, no extra column, and with all integrity (reference and primary key) guaranteed.


PS: perhaps a more useful variant of the model is to adopt the first bits as taxonomic control and the remaining bits as a serial-integer counter. In the illustration, there are 2 taxonomic bits.