How is B-Tree used in DBMS-s if WHERE condition can be anything?

Question

The most of the database management systems use B-tree as a data structure for increasing the performance.

Let's imagine that we have a table users with the following columns: id (int), name (string), email (string), password (string), age (int). We want to select only ones that meet the condition: (name.contains("Bob") and not name.contains("A")) or (email.starts_with("angel") and (age >= 30 or age <= 20)). It's a really complicated condition. How do the database management systems builds the B-tree and how do they move inside them to select the data? Which key should be used inside B-tree? If it's achievable with another data structure type, it is also welcomed!

I am creating my own RDBMS and faced with this problem :)

The way relational databases execute query plans is much more complicated than just manipulating a B-tree. Read [PostGresql Explain](https://www.postgresqltutorial.com/postgresql-tutorial/postgresql-explain/) to find out why. — Robert Harvey, Sep 19 '22 at 03:22
Or watch [this video](https://www.youtube.com/watch?v=P01_Zd11HX0) — Robert Harvey, Sep 19 '22 at 03:31
One key misunderstanding is that the RDBMS does not build the b-tree for the query (that would be way more expensive than just scanning the table in the majority of OLTP queries) instead b-trees are maintained permanently based on schema configuration. — user1937198, Sep 19 '22 at 19:00

JonasH · Accepted Answer · 2022-09-19T08:06:17.353

When talking about databases we usually talk about indexes, they are often implemented using a B-Tree, but does not have to be, there are a wide variety of search structures that could be used, depending on the specific use case, hash tables, binary trees, R-Trees, and many others. Typically the primary key would always have a index, but other columns would have indexes created by the user.

Not all queries will be able to use indexes. A complicated query like yours might be able to use the index for the age to narrow down the number of items, and do a linear scan for the other conditions. If you make a sufficiently complicated query the database engine might decide to just do a full table scan. Indexes supporting queries like name.contains("A") would also need to be more complicated than a regular index, and may have limited availability.

There are composite indexes that can be used when querying multiple columns. I'm no database expert, but I would guess they just start using another column for the key at some depth level of the b-tree.

I would note that creating a general purpose RDBMS system is hard. So you need to carefully consider what your system will do that existing systems do not. It might also be easier to create a specialized system for your specific needs.

Additionally index's do not have to be on field values. Most major RDBMS systems support derived indexes which index some arbitrary function of the columns. — user1937198, Sep 19 '22 at 18:38

How is B-Tree used in DBMS-s if WHERE condition can be anything?

1 Answers1