0

Setup

Consider the following models (pseudo-code)

Place:
  type: const = "Place"
  id: str
  name: str
  lat: float
  lon: float
  # other fields

Event:
  type: const = "Event"
  id: str
  name: str
  start_date: timestamp
  location: Place
  # other fields

I am using Python for the BE and CouchDB as DB System.

When creating a new object, I match my data in Python but I use CouchDB queries to get a list of prospective matches, e.g.

// Query1
{
   "selector": {
      "type": "Event",
      "start_date": {
         "$gt": new_event.start_date - epsilon,
         "$lt": new_event.start_date + epsilon
      },
      "place.lat": {
         "$gt": new_event.place.lat - epsilon,
         "$lt": new_event.place.lat + epsilon
      },
      "place.lon": {
         "$gt": new_event.place.lon - epsilon,
         "$lt": new_event.place.lon + epsilon
      }
   }
}

where epsilon is some constant.

Problem

The current approach requires a non-relational approach as I am using Event.place.lat in the query. To avoid duplication I would prefer a relational approach, i.e. in the DB Event.place is an ID as opposed to an actual object. However, with a relational approach I can no longer use Event.place.lat in my queries.

Possible Solutions

Solution 1: Multiple Queries

To emulate the same query as Query1 I would:

  1. Query using only new_event.type;new_event.start_date to obtain results_1 = [result_1_1, ..., result_1_n]
  2. Generate a list of IDs relation_ids = [r.place for r in results_1]
  3. Query for the relations using new_event.place.lat;new_event.place.lon;relation_ids as filter to obtain results_2 = [result_2_1, ..., result_2_k]
  4. Merge results_1;results_2 to a list of Event objects (Discarding the results_1 elements w/o corresponding results_2 element).

Additional Cost: 1 additional DB query and a little bit of computations in the BE (steps 2 & 4).

Soltuion 2: Expanding Models

This approach is straightforward: Add the fields lat;lon the Event objects saved in the DB. This way Query1 can be executed almost identically (using new_event.lat;new_event.lon).

Additional Cost: Data duplication (lat;lon being in both Event & Place).


Which would be the recommended implementation? Are there any other approaches that I might be missing (w/o leaving CouchDB)?

niko
  • 111
  • 4
  • 2
    would it be better to switch to a relational database? – Ewan Sep 25 '20 at 10:24
  • @Ewan For my current use-case it makes more sense to stick w/ a no-sql imo: My amount of relations is very limited (`Event-Place;1to1` and `Event-Performers;NtoN` that's it) and the same applies for my queries (nothing of the sort `get all Place where Performer P performs`). Besides, CouchDB offers several useful functionalities for my project. – niko Sep 25 '20 at 12:55

1 Answers1

0

I hate writing this but.. "It depends on your use case".

The underlying question here is how static is your list of places? if you delete a place will you want to delete all the events that ever took place at that place?

I think that generally "places" are pretty vague things and the non-relational approach of storing the place in the event document is the right approach.

Say for example there is a Gig, in a room, at a pub. The pub is in a town, the town is in a county, the county is in a country, etc etc they are all places and their names and locations are not all static things.

Will you want to update past events if the pub name changes? What about if the pub remodels and the room is demolished? I doubt you will care.

If this is the case, then stick with the current approach. Search by location and group by name.

If they are static, for example, list of airports or countries, The two query approach is probably your best option. This allows you to have drop downs for selecting the place and show all events at the same place.

If you find that you need this relational approach in many places, then I would switch to an SQL database, which will allow you to select with a join.

Ewan
  • 70,664
  • 5
  • 76
  • 161