1

I'm thinking about a database schema, and would really appreciate if some of you could look at where I'm up to and offer some advice..

The mission - We have to write a program that will fetch data from around 50 different APIs from local governments providing event information. These APIs are not standardized, so schema will vary between each. We want to compile that data into a single database for which our other apps can tap into.

A couple of things we have to consider:

  • There will be common fields in the APIs' JSON/XML such as event name/title, description, location (lat/lng), address, etc. However, each API may issue it's own additional fields that other APIs don't, such as mobile, fax, weather_factors (e.g. if it rains, event cancelled). Ideally we still want to collect extra information, if it seems useful, even if not all APIs issue this.
  • Some APIs may offer translations of event information. These are Japanese APIs, so we can be assured that each will provide Japanese as standard. However, if they offer alternative languages too (English, Chinese etc) we want to store that too. Not all APIs will do this though.

So if MySQL, I'm thinking something along the lines of:

events

This will be fields that don't rely upon languages (e.g. location lat/lng)

  • id
  • api_id (so we can relate the event entry with the api's event)
  • lat
  • lng
  • city
  • category_id (foreign key to category table)
  • contact_id (foreign key to contacts table)
  • start_date
  • end_date

event_info

This is information on the event that we may want to offer multiple translations for, so we have a lang field (e.g. lang="en"). This will be common fields across all APIs - so we expect to fill these (although if, for example, description is missing it'll just be empty), even if only one Japanese entry per event (so can allow 1-to-many, but if only one language will be 1-to-1).

  • id
  • lang
  • event_id (foreign key with events table)
  • name
  • description
  • fee

event_meta

This will be for any additional thing that an individual API provides that we want to capture. So if, for example, they provide a fax number then we can put that in as {"event_id": #, "lang": null, "name": "fax", "value": "12345678"} (I guess in the case of a fax, lang is irellevant)

  • id
  • event_id
  • lang
  • name
  • value

We will also have tables such as contacts (a single office, may issue information on many events) and categories

However, as I'm writing this out, there is also the option to put all this in a Mongo database where we can probably handle translations like:

"_id": ...,
"api_key": "784578435",
"location": {
  "lat": ...,
  "lng": ...
},
"name": {
  "en": "Fun festival",
  "ja": "tanoshii matsuri"
},
"description": {
  "en": "Hello world...",
  "ja": "konichiwa sekai..."
},
"fee": "if given, we can add it"
"category_id": DBRef..
"contact_id": DBRef..

Plus, extra fields can easily be handled. Hm, Mongo does seem much simpler lol. Anyway, your suggestions to ensure I'm heading somewhat in the correct direction would be appreciated, thanks.

Martyn
  • 795
  • 8
  • 21

0 Answers0