22

I'm developing a small system with two components: one polls data from an internet resource and translates it into sql data to persist it locally; the second one reads that sql data from the local instance and serves it via json and a restful api.

I was originally planning to persist the data with postgresql, but because the application will have a very low-volume of data to store and traffic to serve, I thought that was overkill. Is SQLite up to the job? I love the idea of the small footprint and no need to maintain yet another sql server for this one task, but am concerned about concurrency.

It seems that with write ahead logging enabled, concurrently reading and writing a SQLite database can happen without locking either process out of the database.

Can a single SQLite instance sustain two concurrent processes accessing it, if only one reads and the other writes? I started writing the code but was wondering if this is a misapplication of SQLite.

gnat
  • 21,442
  • 29
  • 112
  • 288
rgb
  • 879
  • 2
  • 8
  • 18
  • 3
    @gnat Cool. Can a single SQLite instance sustain two concurrent processes accessing it, if only one reads and the other writes? I started writing the code but was wondering if this is a misapplication of SQLite. – rgb Oct 08 '13 at 16:44
  • Just a heads up. In my previous company, we were using SQL (both MS and Oracle Express) for some storage and we always felt that with what little we stored, we didn't need a full DB. So in one of the releases we decided to do exactly what you are doing. Replace those products with SQLite. We had exactly the same thing, one writer that would lay down data on disk and update SQL-based TOC and reader process (multiple threads) that would read TOC to determine which data to retrieve. Don't know about SQLite these days, but what little concurrency we ran into turned out to be a major pain in... – DXM Oct 09 '13 at 06:11
  • ... the behind. I don't remember all details, but I think when one process tried to get a lock and couldn't because another one was reading, it would SLEEP for something crazy like 20-30 seconds. We ended up creating a dedicated thread that was responsible for SQLite access and then we had both our processes, and all threads in them, serialize their DB requests to that one thread. In hindsight, I probably would not have gone with SQLite again. – DXM Oct 09 '13 at 06:13
  • 1
    @DXM thanks for the warning, but after running a few tests I haven't run up against anything similar. I know sqlite got a major overhaul with version 3, which was around 2004, so I wonder if your negative experience dates back before that time. – rgb Oct 09 '13 at 15:26
  • @BBnyc: My experience was from Q1 2011 and up. We did same thing, created some simple standalone proof of concepts. Made sure stuff doesn't blow up right away and then we went ahead and introduced it into the product. Then it was nothing but headache for a pretty long time. One thing we ended up doing to minimize concurrent access from what little concurrent clients that we had was to put each unrelated tabl (i.e. those that had foreign key relationships with each other) into its own SQLite DB. But yeah our conclusion was that it is much better to manage access and queue up requests... – DXM Oct 09 '13 at 15:51
  • 1
    ... yourself, rather than rely on SQLite's locking schemes. I didn't do the work myself, it was another another team but I kept in the loop mostly to provide feedback and out of curiosity. I also went online and did some independent reading and found original author's page. From reading that, I got the impression that the inventor of SQLite simply hated threads and didn't see why anyone would use them, so a) the DB wasn't designed with them in mind and b) locks/protection was kinda added/hacked in as an afterthought because too many people asked for it. – DXM Oct 09 '13 at 15:54

1 Answers1

25

You are looking for the File Locking And Concurrency documentation.

SQLite processes use a series of locks to handle concurrency; to read, several processes can obtain a SHARED lock.

A process that writes, will need to obtain a RESERVED lock, and only when actually having to flush changes to disk does it move to the PENDING state. Any reading process will then have to unlock the file, after which the writing process is able to move to EXCLUSIVE for writing to the actual database file.

Because the writer process only needs to lock the database file for actual writes (memory flushes, commits), a setup with just one reader and just one writer will perform quite well. I'd expect it to perform just as well, if not better, as a setup with just one process doing all the reading and writing.

SQLite is less suited when you have multiple processes frequently writing to the same database, as writing requires obtaining the exclusive PENDING lock to serialize changes.

Martijn Pieters
  • 14,499
  • 10
  • 57
  • 58
  • Thanks for the thorough answer Martijn! I some scripts to do some testing and it seems that two processes will happily read and write a single sqlite instance concurrently. I was doing concurrent read and write requests firing off every 1/100th of a second and still did not receive a locked db exception. Strangely, the only time I got a "database locked" error message was when manually trying (with the sqlite3 command line client) to delete a few rows while read requests were firing off from my script at 1/100th of a second. I wonder if pysql automatically retries a write after such an error. – rgb Oct 09 '13 at 15:24