0

The most of the database management systems use B-tree as a data structure for increasing the performance.

Let's imagine that we have a table users with the following columns: id (int), name (string), email (string), password (string), age (int). We want to select only ones that meet the condition: (name.contains("Bob") and not name.contains("A")) or (email.starts_with("angel") and (age >= 30 or age <= 20)). It's a really complicated condition. How do the database management systems builds the B-tree and how do they move inside them to select the data? Which key should be used inside B-tree? If it's achievable with another data structure type, it is also welcomed!

I am creating my own RDBMS and faced with this problem :)

Vega TV
  • 111
  • 4
  • 3
    The way relational databases execute query plans is much more complicated than just manipulating a B-tree. Read [PostGresql Explain](https://www.postgresqltutorial.com/postgresql-tutorial/postgresql-explain/) to find out why. – Robert Harvey Sep 19 '22 at 03:22
  • 1
    Or watch [this video](https://www.youtube.com/watch?v=P01_Zd11HX0) – Robert Harvey Sep 19 '22 at 03:31
  • One key misunderstanding is that the RDBMS does not build the b-tree for the query (that would be way more expensive than just scanning the table in the majority of OLTP queries) instead b-trees are maintained permanently based on schema configuration. – user1937198 Sep 19 '22 at 19:00

1 Answers1

4

When talking about databases we usually talk about indexes, they are often implemented using a B-Tree, but does not have to be, there are a wide variety of search structures that could be used, depending on the specific use case, hash tables, binary trees, R-Trees, and many others. Typically the primary key would always have a index, but other columns would have indexes created by the user.

Not all queries will be able to use indexes. A complicated query like yours might be able to use the index for the age to narrow down the number of items, and do a linear scan for the other conditions. If you make a sufficiently complicated query the database engine might decide to just do a full table scan. Indexes supporting queries like name.contains("A") would also need to be more complicated than a regular index, and may have limited availability.

There are composite indexes that can be used when querying multiple columns. I'm no database expert, but I would guess they just start using another column for the key at some depth level of the b-tree.

I would note that creating a general purpose RDBMS system is hard. So you need to carefully consider what your system will do that existing systems do not. It might also be easier to create a specialized system for your specific needs.

JonasH
  • 3,426
  • 16
  • 13
  • Additionally index's do not have to be on field values. Most major RDBMS systems support derived indexes which index some arbitrary function of the columns. – user1937198 Sep 19 '22 at 18:38