7

Is it sensible/reasonable to consider using an SQLite database file as a data interchange format? Are there any obvious gothchas or reasons not to do this?

Example. Say I have a master SQL Server database and periodically need to SELECT a set of rows from a table, export them into a file, and transfer the file to a remote system where the data will be imported into a remote/slave database (it could be SQL Server or some other database technology entirely, e.g. SQLite, MySQL, etc.). I.e. the file format should be general purpose and not be tied to a specific database technology.

Some pros are:

  • No encoding issues to worry about. This is all covered by the SQLite db file format, including things like endianness of binary data.
  • Compact encoding of binary data. E.g. avoids text encoding of numeric data, or base64 encoding of binary objects.
  • Well defined data schema. The data will be in a table with well defined column types.

Cons:

  • Not human readable/editable, e.g. if a tester wanted to generate or edit a data file, they will need additional knowledge and tools. If this were e.g. a csv, json or XML file, anyone can immediately see and understand the structure of the data in a text editor, and can directly edit it.
  • Somewhat non-standard. Requires additional dependency on a SQLite library/dll.
  • One more table schema to maintain, i.e. in addition to the master and remote database tables, there is now a third table.

Is this idea at all sensible?

redcalx
  • 345
  • 3
  • 14
  • Why doesn't a CSV work for you? – Robert Harvey May 17 '18 at 16:27
  • 3
    Seems you already got most cons and pros. The only thing I would add here on the con side: no guarantee the sqlLite db format will stay compatible with the next major release. – Doc Brown May 17 '18 at 16:35
  • CSV is an option of course (issues around comma encoding/escaping aside), otherwise I think the list of pros above are all benefits compared to using CSV (or json or XML). – redcalx May 17 '18 at 16:37
  • I suppose the broader question is: Is there a standard binary data interchange format? (noting that CSV is not a complete well defined standard). – redcalx May 17 '18 at 16:42
  • 2
    Perhaps google protocol buffers are a better fit... https://developers.google.com/protocol-buffers/ – redcalx May 17 '18 at 16:45
  • 1
    @DocBrown in that case you can keep using the previous major release. – user253751 May 18 '18 at 01:42
  • 2
    @immibis: "can" is the wrong word, you **must** keep the previous major release. This may be fine for the OP, or not, they could just include this in they decision process. It is just a "con" argument as the others, no show stopper. – Doc Brown May 18 '18 at 04:31
  • @DocBrown Upgrading from one major release's file format to another is trivially done with a dump and restore. If the SQL doesn't involve anything unique to the later release, downgrades can be done, too. – Blrfl May 18 '18 at 13:25
  • @Blrfl: upgrading from one file format to another is not really hard, but it requires some effort: either hoping the new sqlite version supports both formats, or linking against two sqlite library versions. And when the program gets a binary interchange file, it will have to detect the file format version. This is all not hard, but still a slight "con" argument. – Doc Brown May 18 '18 at 17:09
  • @DocBrown: For many platforms/projects, file format changes may be a legitimate risk. For SQLite, I'd call that a non-issue. File format [has been stable since 2004](https://www.sqlite.org/fileformat2.html) and the developers expect it to remain stable and compatible with future SQLite releases until 2050 and beyond: ["Database files created today will be readable and writable by future versions of SQLite decades in the future"](https://www.sqlite.org/lts.html). Library of Congress even calls them out as a [preferred format for storing/archiving data sets](https://www.sqlite.org/locrsf.html). – Christopher Cashell May 24 '23 at 00:26
  • @ChristopherCashell: thanks for providing the link, that promise was indeed new to me (but it is still just a goal, no bulletproof guarantee). Think of what can happen when some large tech company decides to buy the SQLite team because they have different goals ;-) But I think the practical risk that this will happen is quite low. – Doc Brown May 24 '23 at 06:00

2 Answers2

6

It happens to be possible to use SQLite databases as a data interchange format, but it's not a particularly good solution. Why?

  • You are looking for a compact data exchange format. In contrast, SQLite's goal is to store data in a manner that is easily queried and modified on disk. This is not necessarily a compact format, e.g. if the database contains free pages.

  • By using a not-human readable format where data can only be accessed through the SQLite interface, you are sacrificing quite a lot debuggability. On the plus side, SQLite files, XML, and JSON are all self-describing, with SQLite and optionally XML also containing their data schema within the document.

  • SQLite happens to have a good compatibility story, but different versions have different capabilities. You will have to make sure that your generated database files are understood by all supported SQLite versions, which may require extra configuration and testing.

  • By using SQLite, you are constrained to SQLite's data model. This means everything fits into rows and columns, and is dynamically typed. Other data formats have more flexible data models that are better suited to arbitrary object graphs. Particularly XML also has excellent support for defining and checking data schemas.

    As you are taking data out of a database, the data schema should roughly fit, but note that database-specific extensions might be problematic. It's also certainly not the case that any data in a SQLite database would be fit to import into another DB. Since you will necessarily have to write extra tooling for export/import, choosing SQLite over some other file format does not represent a shortcut that will save you a noticeable amount of work.

In most scenarios, using some established data interchange format is vastly preferable, which often means XML or JSON, or possibly a binary format such as Protobuf. Unless you transfer mostly binary data or data that can be defined with simple structs, the choice doesn't matter very much – just base64-encode any binary data before putting it into a textual format, which only costs you 33% overhead.

CSV might be an option for tabular data, but most CSV implementations are underspecified and disagree on details like how delimiters and newlines get to be escaped – so take extra care to configure that properly.

If you need to combine multiple documents into one file, a common strategy is to use a ZIP archive as an envelope. Those are easy to debug using common tools, and easy to manipulate programmatically. E.g. the “ZIP of XML” strategy is used by the .docx and .odt file formats.

Please also consider whether manually moving data between two databases is the correct solution for your needs. Most DBs have builtin support for a master + readonly slave setup. That might be a lot easier. If you move data manually, you will also have to think about data consistency issues. Are you transferring a selection of records, or are you bringing the complete remote DB up to date? Are you transferring records, or a log of changes that were applied to these records since the last sync? How are you making sure the transferred data results in a consistent view, e.g. with respect to constraints and foreign key relationships?

So to summarize: You have a difficult problem. Using SQLite may be possible, but it won't make this problem any easier. Prefer established solutions.

amon
  • 132,749
  • 27
  • 279
  • 375
-5

In these days of GDPR and its potential, punitive fines for any and all Data Breaches, delivering anything via "Sneaker-Net" seems like the wrong direction to be going in.

Why can't the "remote system" access the master SqlServer database (or a REST API that sits in front of it) and pull the data directly?

Phill W.
  • 11,891
  • 4
  • 21
  • 36
  • 10
    GDPR is going to be an issue whether it's a SQLite database or an Excel spreadsheet. SQLite is not especially remarkable in this regard. – Robert Harvey May 17 '18 at 16:26
  • Perhaps he is updating the software for transferring today's Nuclear launch codes POTUS for some unknown reason is not allowed to put them on internet. – mattnz May 18 '18 at 02:56