243

I am currently creating a web application that allows users to store and share files, 1 MB - 10 MB in size.

It seems to me that storing the files in a database will significantly slow down database access.

Is this a valid concern? Is it better to store the files in the file system and save the file name and path in the database? Are there any best practices related to storing files when working with a database?

I am working in PHP and MySQL for this project, but is the issue the same for most environments (Ruby on Rails, PHP, .NET) and databases (MySQL, PostgreSQL).

Peter Mortensen
  • 1,050
  • 2
  • 12
  • 14
B Seven
  • 3,105
  • 3
  • 16
  • 14
  • Do you only retrieve them when you actually need them? –  May 29 '12 at 15:11
  • 11
    Related question on DBA.SE: [Files - in the database or not?](http://dba.stackexchange.com/q/2445/2660) – Nick Chammas May 29 '12 at 20:10
  • 15
    Surprised that no one posted the MS research done on this issue (for SQL Server 2008): [To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem](http://research.microsoft.com/apps/pubs/default.aspx?id=64525) – Oded May 30 '12 at 18:50
  • 3
    *large* is a relative quantity, I ( and many others probably ) don't see `10MB` as large in a modern system. –  May 30 '12 at 22:58
  • 30
    This is on-topic according to the FAQ - it fits under the bullets "design patterns" (slash antipatterns) and "software architecture". Why was it closed? – Izkata May 31 '12 at 18:25
  • @Izkata (imho) the close reason is wrong here, the question is on topic but it's also _very_ not constructive. – yannis Jun 01 '12 at 06:30
  • 1
    B Seven: You are not giving us enough information to go on, and you are asking a couple of very vague questions. Notice how there's an "it depends" element on almost all answers? Well, we really don't want that. You can either edit your question to be a little bit more specific (including some details for your project), _or_ try a do over with a new question. Feel free to ask on [meta] or [chat] for further details on how to improve your question. – yannis Jun 01 '12 at 06:33
  • 22
    I don't see any vagueness in the question as it is now. I have no idea why it was closed. – reinierpost Oct 05 '12 at 11:27
  • 1
    Let us remember those innocent times of 2012, when 10MB was a 'large' file. – Roger Nov 26 '19 at 15:31

13 Answers13

170

Reasons in favor of storing files in the database:

  1. ACID consistency including a rollback of an update which is complicated when the files are stored outside the database. This isn't to be glossed over lightly. Having the files and database in sync and able to participate in transactions can be very useful.
  2. Files go with the database and cannot be orphaned from it.
  3. Backups automatically include the file binaries.

Reason against storing files in the database:

  1. The size of a binary file differs amongst databases. On SQL Server, when not using the FILESTREAM object, for example, it is 2 GB. If users need to store files larger (like say a movie), you have to jump through hoops to make that magic happen.
  2. Increases the size of the database. One general concept you should take to heart: The level of knowledge required to maintain a database goes up in proportion to the size of the database. I.e., large databases are more complicated to maintain than small databases. Storing the files in the database can make the database much larger. Even if say a daily full backup would have sufficed, with a larger database size, you may no longer be able to do that. You may have to consider putting the files on a different file group (if the database supports that), tweak the backups to separate the backup of the data from the backup of the files etc. None of these things are impossible to learn, but do add complexity to maintenance which means cost to the business. Larger databases also consume more memory as they try to stuff as much data into memory as possible.
  3. Portability can be a concern if you use system specific features like SQL Server's FILESTREAM object and need to migrate to a different database system.
  4. The code that writes the files to the database can be a problem. One company for whom I consulted not so many moons ago at some point connected a Microsoft Access frontend to their database server and used Access' ability to upload "anything" using its Ole Object control. Later they changed to use a different control which still relied on Ole. Much later someone changed the interface to store the raw binary. Extracting those Ole Object's was a new level of hell. When you store files on the file system, there isn't an additional layer involved to wrap/tweak/alter the source file.
  5. It is more complicated to serve up the files to a website. In order to do it with binary columns, you have to write a handler to stream the file binary from the database. You can also do this even if you store file paths but you don't have to do this. Again, adding a handler is not impossible but adds complexity and is another point of failure.
  6. You cannot take advantage of cloud storage. Suppose one day you want to store your files in an Amazon S3 bucket. If what you store in the database are file paths, you are afforded the ability to change those to paths at S3. As far as I'm aware, that's not possible in any scenario with any DBMS.

IMO, deeming the storage of files in the database or not as "bad" requires more information about the circumstances and requirements. Are the size and/or number of files always going to be small? Are there no plans to use cloud storage? Will the files be served up on a website or a binary executable like a Windows application?

In general, my experience has found that storing paths is less expensive to the business even accounting for the lack of ACID and the possibility of orphans. However, that does not mean that the internet is not legion with stories of lack of ACID control going wrong with file storage but it does mean that in general that solution is easier to build, understand and maintain.

Thomas
  • 1,815
  • 1
  • 11
  • 6
  • 1
    Why can't you use CDNs? This is a supported scenario with pretty much every CDN I've ever heard of. – Billy ONeal May 30 '12 at 17:55
  • @BillyONeal - You can't use a CDN *and* store the file in the database. Unless you are OK with duplication, you can't have both. – Thomas May 30 '12 at 20:09
  • 3
    Erm, the whole point of a CDN is duplication. CDNs merely cache the target of a web address -- the only requirement is that there's an HTTP host serving the content, and that the content changes rarely. (How on earth is the CDN supposed to tell where you pulled the image from anyway?) – Billy ONeal May 30 '12 at 20:22
  • 3
    @BillyONeal - However, I think this is bad choice of words on my part and I've adjusted my answer. Specifically, if you want to use *cloud storage* (and then perhaps use a CDN with your cloud storage), you can't do it natively with the database storage solution. You would have to write a synchronization routine to pull the files from the database and then send them to your cloud storage provider. – Thomas May 30 '12 at 21:17
  • 1
    @BillyONeal - In a way, your comment was the best answer. You can have all the benefits of DB storage, but none of the problems. – B Seven Oct 26 '16 at 16:08
  • When storing files in a DB your backup strategy is fairly simple - that's great. But how would I backup my file server and database in a synchronized fashion? – Wecherowski Jun 04 '20 at 23:04
  • @Wecherowski - This is one of the gotchas with having the files outside the db: you have to devise your own backup strategy for the files. There are plenty of solutions. S3 for example has native versioning that can be enabled so that you don't really have to backup the files. Anytime the file is changed, it is actually versioned under the hood. Of course, _accessing_ those backups and whether those older versions are visible to the user and truly deleting files are all new complications that you have to handle. The "best" answer can depend greatly. – Thomas Jun 07 '20 at 16:48
  • @Thomas haha, I somewhat expected it to be trickier by the fact that people never mention file server backups in this discussion :-) Can you give me some references or search phrases to dig into this topic? I have close to no devOps experience, searching stuff like "file server database synchronized backup" didn't yield any useful results for me ... (I'm not even sure, whether it *has* to be synchronized or whether there're other ways ...) – Wecherowski Jun 07 '20 at 18:33
  • 1
    @Wecherowski - What you seek is a [file versioning system](https://en.wikipedia.org/wiki/Versioning_file_system). Even a source control system like git can be used for file versioning. OpenVMS for example has a long history but there are third-party tools than you can get that will do this for you too. It depends greatly on the functionality needed. – Thomas Jun 09 '20 at 14:54
99

In many cases, this is a bad idea. It will bloat the database files and cause several performance issues. If you stick the blobs in a table with a large number of columns it's even worse.

However! Some databases, like SQL Server have a FILESTREAM column type. In this case, your data is actually stored in a separate file on the database server and only an ID to the file is saved in the table. In this case I don't see much of a reason not to keep the data in the SQL server. The files are automatically included as part of the server backup, and the database and the files are never out of sync. The problem with Tony's suggestion of storing file names, is that the database and the filesystem can get out of sync. The database will claim a file exists when it's been deleted on disk. If a process is modifying the database and then crashes, the files and the database will not match (i.e. no ACID with files outside of a database).

Peter Mortensen
  • 1,050
  • 2
  • 12
  • 14
Timothy Baldridge
  • 1,466
  • 1
  • 9
  • 13
  • 25
    I disagree with the statement ` If a process is modifying the DB and then crashes, the files and the DB will not match.` If you wrap the entire process in a transaction (create file, validate file, update db) and throw error messages when something goes wrong it's quite easy to keep them in sync. – briddums May 29 '12 at 14:57
  • 4
    I'm with briddums on that: consider scenario: store file to filesystem (without deleting old one), update DB, on success delete old file, on rollback delete new file. Worst case scenario - if the process gets interrupted, you have orphan file. But you always have the files referenced by DB in correct version. – vartec May 29 '12 at 15:01
  • 3
    Other potential problems with the File/DB method: 1) you have to do updates as copy-on-write. If your process crashes during an update, the DB status will be rolled back, the file will not. 2) Doing this then requires some sort of garbage collection of the old file. 3) Storing everything in the DB means that the versions of the DB and files are in sync after backups. Restore your DB to its state 2 weeks ago...now what where the contents of the files at that time? – Timothy Baldridge May 29 '12 at 16:54
  • @briddums - Good point, though it's extra work for the developer. Also, you are still exposed to the database and file system going out of sync in other ways. For example: What if someone modifies the file system directly? What if you want to migrate the database to another location and something goes wrong with the migration? What if you need to restore the database from a backup, or recover it to a point in time? Can you be sure you're getting everything you need? – Nick Chammas May 29 '12 at 20:06
  • All of these vulnerabilities can be mitigated with extra developer work, but at some point you'll find it's easier and faster to let the database do it all for you. – Nick Chammas May 29 '12 at 20:06
  • @Nick don't all your arguments also apply to the FileStream column type, since that's just storing it on disk as well? – briddums May 29 '12 at 22:36
  • 4
    @briddums - Nope, since SQL Server integrates directly into the file system and manages those files on behalf of the OS. I haven't used them myself, but the documentation makes it look like [FILESTREAM](http://msdn.microsoft.com/en-us/library/gg471497.aspx#storage) and its descendant [FileTables](http://msdn.microsoft.com/en-us/library/ff929144.aspx#Goals) grant you the best of both worlds: Files are bound tightly to the database and relating data (allowing you to centrally manage your data) without bloating the database. – Nick Chammas May 29 '12 at 23:29
  • 2
    I agree with Nick. We've replaced our Disk+DB system with FILESTREAM columns and never looked back. It's really nice to be able to have files tie out to other tables via FKs. So you can actually say "each person must have one or more HR docs associated with them", or something else like that. – Timothy Baldridge May 30 '12 at 03:19
  • 1
    One significant downside to the FILESTEAM object is that it requires files to be stored on direct attached storage. I.e., you cannot reference files on a NAS. – Thomas May 30 '12 at 05:09
  • @briddums: The main problem with your argument is that you move the responsibility for maintaining ACIDity into your application oode. With a single-concurrent-user, single-application database it may be an acceptable workaround, but it makes database people cringe. – reinierpost May 30 '12 at 11:55
  • @reinierpost I just don't see the difference between DO TRANSACTION: copy file to blob END. and DO TRANSACTION: copy file to os END. You're still within a transaction, so ACID should still apply. – briddums May 30 '12 at 16:34
  • 1
    @briddums: Yes, once you need DO TRANSACTION you have that problem ... but that statement only does anything meaningful for your database operations. For transactional filesystem operations, you e.g. have to implement rollback. See also Thomas's answer. – reinierpost May 30 '12 at 17:16
  • For anyone coming along years after this answer was written, it may be worth noting that PostgreSQL has a feature called TOAST which improves support for working in this way: http://www.postgresql.org/docs/current/static/storage-toast.html – algal Mar 04 '16 at 19:27
  • @algal I read through the description of TOAST but the [on-disk TOAST storage](https://www.postgresql.org/docs/current/static/storage-toast.html#STORAGE-TOAST-ONDISK) seems like just another database table, so we are still saving the files in table rows. The [FILESTREAM](https://technet.microsoft.com/en-us/library/bb933993%28v=sql.105%29.aspx?f=255&MSPPError=-2147217396) seems to really save each file as a real file in a file system. (I'm not familiar to either database.) – Franklin Yu Mar 14 '17 at 15:34
  • @FranklinYu I don't know pg well either. But yes, it seems like TOAST does not work simply through pointers to files on the filesystem but instead represents some other kind of support for large values. If so, I'd guess it provides the performance and convenience benefits of "everything-in-the-database," but not the debuggability of being able to easily inspect the files themselves via the filesystem. OTOH, this might also make backup and migration easier. – algal Mar 14 '17 at 22:13
42

Yes, it is a bad practice.

Performance impact on the DB:

  • if you do a SELECT with any BLOB column, you will always do a disk access, while without BLOBs you have a chance to get data straight from RAM (high throughput DB will be optimized to fit tables in RAM);
  • replication will be slow, replication delay high, as it will have to push BLOB to slaves. High replication delay will be causing all kinds of race conditions and other synchronization problems, unless you explicitly take that in account;
  • DB backups/restore will take lot longer;

Speed advantage — none! While some older filesystems would not handle well directories with millions of files, most modern have no problem at all and in fact use same kind of data structures as BDs (typically B-trees). For example ext4 (default Linux filesystem) uses Htree.

Conclusion: it will hinder your DB performance and will not improve file retrieval performance.

Also, since you're talking about web application — serving static files directly from filesystem using modern webserver, which can do sendfile() syscall is tremendous performance improvement. This is of course not possible if you're fetching files from DB. Consider for example this benchmark, showing Ngnix doing 25K req/s with 1000 concurrent connections on a low end laptop. That kind of load would fry any kind of DB.

vartec
  • 20,760
  • 1
  • 52
  • 98
  • 6
    +1. Let your web server do what it does best, serving files from disk. Don't make it ask PHP, as PHP will have to ask MySQL, etc. – deizel. May 29 '12 at 14:32
  • @deizel: good comment, added info on that to the answer – vartec May 29 '12 at 14:44
  • 3
    When will programmers learn that performance isn't all that matters? – reinierpost May 30 '12 at 11:57
  • 3
    @reinierpost: lol. probably when we get liberal arts majors ;-) – vartec May 30 '12 at 11:58
  • Of course; if you have multiple web servers which you'd like to have sharing the same content; this answer kinda falls apart. Yes, it costs to do replication and such, but the files have to get onto the webserver *somehow*. (It would be good practice to *cache* things on the webserver though) – Billy ONeal May 30 '12 at 17:54
  • 1
    @BillyONeal: why do you assume, that you have to have same server for static and dynamic content? As for synchronizing files across servers, there are tools specifically designed for that, much more efficient than databases. Using database as fileserver is like trying to hammer a nail with a screwdriver. – vartec May 31 '12 at 08:59
  • @vartec: If it's just static content, then there'd be no point in putting it in a database in the first place. But for things like users' photos and avatars, or other kinds of content like that, putting these things in the DB can be a lot less work than attempting to setup synchronization services across all your webservers; and because the servers cache things when they're retrieved from the database, the perf hit on the DB is negligible. – Billy ONeal May 31 '12 at 18:05
  • @vartec: Contrary; why do you assume that they're on separate servers? The question is whether or not this is "bad practice" -- and of course there are going to be some workloads (e.g. Flickr) where putting the content in the DB is a bad idea. But I would argue that there are plenty of cases where putting all the data in the database is the most maintainable and best solution. It depends on the application, and therefore is far from "bad practice" in my mind. – Billy ONeal May 31 '12 at 18:14
  • 1
    @BillyONeal: I do agree there are some "solutions" where that would work, I've seen quite a lot of amateur PHP setups with images in MySQL. However, in such a setup a DB will never support high traffic serving BLOBs. – vartec Jun 01 '12 at 08:28
  • @vartec: 1. I don't see why that would be, so long as the web sides actually implement some sort of a cache. 2. 99.9% of websites are not "high traffic." If it were, then languages like PHP would have never become very popular in the first place. – Billy ONeal Jun 01 '12 at 12:00
  • 1
    @BillyONeal: The question is if it's a bad practice, not if it will work in some (eg. low-traffic) cases. And I'm not saying it's not possible, just I'm not saying it's not possible to hammer a nail using a screwdriver. Cache with 10MB BLOBs on app server? You'll be creating problems you wouldn't have doing it the normal way... – vartec Jun 01 '12 at 12:22
  • @vartec: Bad practice means "bad in the majority of cases". The majority of cases are low traffic. – Billy ONeal Jun 01 '12 at 14:15
  • @BillyONeal: no, "bad practice" means anything less than best practice. – vartec Jun 01 '12 at 14:47
  • @vartec how would you recommend serving files only to users who should have access, while using the nginx server to do it? I can see only a few ways. 1) Decrypt files on the client. 2) Generate secret filenames every time and 302 redirect to them, serve one time only 3) Use Amazon S3 buckets and give permissions to a certain cookie/session. – Gregory Magarshak Jun 22 '20 at 22:54
  • @vartec "bad in majority of cases" already implies "anything less than best practice in majority of cases" you can't have a global best practice that fits every environment. It is defined under the case. Also you need to chill. – timuçin Dec 11 '20 at 14:59
  • "if you do a SELECT with any BLOB column, you will always do a disk access" as opposed to retrieving a file from file systems where you don't have to access to the disk? – timuçin Dec 11 '20 at 15:03
  • @timuçin as opposed to retrieving it from in-memory cache. – vartec Dec 15 '20 at 00:44
  • @vartec so are you saying filesystem retrieves the data from memory or are you comparing db retrieving blob and other data types. This could be a great argument if your point was "never store files in db, store integers instead". I t would be the opposite of your situation, you would be wrong with a good argument. cheers – timuçin Dec 16 '20 at 01:39
26

I would be pragmatic about it, and follow the "don't optimize yet" principle. Make the solution that makes sense at the moment, and one that you have the development resources to properly implement. There are plenty of potential problems. But those do not necessarily become real problems. E.g. It would probably not be a problem if you have 100 users. It might be a problem if you have 100,000 or 10,000,000 users. But in the latter case, there should be a basis for more development resources to deal with all the issues.

But storing the data in the database does relieve you from dealing with other problems, e.g. where should the files be stored, how should they be backed up, etc. Since you are writing a web application it would be a very good idea for security reasons to make sure that the process hosting the application does not have write access to the file system, so you need to configure the server so that process has read/write access to the folder where data is stored.

I would personally choose to store the data in the database, but make sure that the BLOBS are not read until they are really needed, i.e. no "SELECT * FROM ..." executed on those tables containing blogs. And I would make sure that the design makes it easy to move the data out of the database, into the filesystem, if you do get performance problems. For example store the file information in a separate File table, thus keeping the file information away from other business entities.

Assuming that you have a File class for representing a file read in the database, then the coding impact of later moving it out will be minimal.

Pete
  • 8,916
  • 3
  • 41
  • 53
18

Microsoft released a white paper about this a few years back. It concentrates on SqlServer, but you may find some interesting information in there:

To BLOB or not to BLOB? Large Object Storage in a Database or a Filesystem?

A very concise version of their conclusion is:

When comparing the NTFS file system and SQL Server 2005, BLOBS smaller than 256KB are more efficiently handled by SQL Server, while NTFS is more efficient for BLOBS larger than 1MB.

I would recommend that you write some small tests for your particular use case. Bear in mind that you have to beware of caching effects. (I was amazed the first time I got save-to-disk speeds that seemed to have higher throughputs than was physically possible!)

Benjol
  • 3,747
  • 5
  • 33
  • 41
  • 7
    You should know that NTFS starts behaving very erratically when you put more that ~100K files in a single directory. File access slows down quite a bit (a least an order of magnitude) and file open operations start failing (apparently) randomly. I've experienced this effect on Windows 2008 and Windows 7 systems. When I re-distributed files among multiple directories, everything returned to normal. I don't know if the situation has improved since then. – Ferruccio Sep 02 '16 at 14:47
13

The old conventional wisdom of storing files outside database might no longer hold. As a matter of principle, I'd favor integrity over speed, and with a modern DBMS, you can have both.

Tom Kyte seems to agree:

I know of no advantages to storing data I want to keep for a long time outside of a database.

If it is in the database I can

be sure it is professionally managed

backed up

recoverable (with the rest of the data)

secured

scalable (try putting 100,000 documents in a single directory, now, put them in table - which one 'scales' - it is not the directory)

I can undelete (flashback) easily

I have locking

I have read consistency...

gnat
  • 21,442
  • 29
  • 112
  • 288
  • This kind of depends on the application. If you're a PACS system (medical images), you'll need data integrity and security. If you're running something with best-effort service delivery like Snapchat, you may not really care about lost data as much as performance as long as it's pretty rare. – Bacon Bits Jun 05 '20 at 18:10
8

Yes.

If you serve a file from your filesystem, your Web server can use kernel code like sendfile() on BSD or Linux to copy the file directly to the socket. It's very fast and very efficient.

Serving files out of the database means you have to copy data from the database server's disk to database server memory, then from db server's memory to the db server's network port, then in from the network to your Web server process, then out again to the outgoing network connection.

Unless you have a really good reason not to, it's always better to serve static files from the file system.

Evan P.
  • 181
  • 3
  • This is true, but I fail to see where the user states in the question that he will be serving static files from the database. This very well could be dynamic files or user uploaded files which if stored on the filesystem seperate from the database now must be synced and have a seperate backup/restore process. – maple_shaft May 29 '12 at 19:13
  • 1
    My understanding is the question is about serving user-uploaded files. "I am currently creating a web application that allows users to store and share files [...] It seems to me that storing the files in a database [...]". I don't think it's really that convenient to do DB dumps with lots of multi-megabyte blobs in the database. Also: yes, it's hard to deal with files; synching, archiving, are all more difficult. However, it's not _much_ more difficult, and sacrificing online performance to save a few lines in your nightly backup script is a big mistake. – Evan P. May 30 '12 at 01:23
6

It's usually best to store large BLOBs in a separate table and just keep a foreign key reference to the BLOB in your main table. That way, you can still retrieve the file from the database (so you don't need any special code) and you avoid the problems surrounding external DB dependencies (keeping the DB and filesystem in sync, etc), but you only incur that overhead if you explicitly join to that table (or make a separate call). 10MB isn't terribly large, most modern commercial databases won't have a problem. The only reason I'd store a file in the filesystem is to cut down on database bandwidth. If your database is going to be shuffling a lot of these files, then you may need to split the workload and only store a file descriptor of some sort. Then you can have a separate call to load the file from another server, so you aren't tying up your database connections (and network connections on your database server) with all those file transfers.

TMN
  • 11,313
  • 1
  • 21
  • 31
5

Famous Tom Kyte has written that they (the Oracle) are using the Oracle database as file server and it's working perfectly fine, even faster that normal filesystem, with full transactionality, no performance loss and with single backup.

Yes, but note, they are the producer of the Oracle DB, and for any other user there are cost issues. Using commercial DB such as Oracle for storage of files is simply cost ineffective.

However, with PostgreSQL for example, you can simply run another DB instance only for blob storage. You have then full transactional support. But transactionality costs DB space. There is the need for database to store multiple blob instances for multiple concurrent transactions. On PostgreSQL it is the most painful, since this database stores the duplicates of blobs made for transaction are stored even if they are not needed anymore, until VACUUM process is done.

With filesystem storage, on the other hand, you must be very carefull when someone modifies the file, because transaction can be rolled back and the copy of the file must be kept until the old version is no longer visible.

In the system where files are only added and deleted, and transactional access to files is not an issue, the filesystem storage will be IMHO the best choice.

  • Hi, when you said "using ... Oracle for storage of file is simply cost ineffective", what if we are already using Oracle of storage of other non-file data? Will that still be cost ineffective? – Devs love ZenUML Mar 01 '16 at 02:58
  • RE: "you must be very carefull when someone modifies the file"... as a former Oracle DBA, I have to suggest that large files be kept out of the database and that you never allow the files to be modified. People make mistakes. The only practical way to manage the rollback (undo) of those files is to implement a Copy On Write system for them. All versions are thus maintained and archived. The oldest can be moved off to remote storage, post processed to consolidate small changes into one archive, etc. – DocSalvager Oct 27 '16 at 20:44
4

You might run into some of this problems:

  • Doing a SELECT * which involves the row with the large blob takes very long, even if you don't need the blob (Of course you should do a specific select, but sometimes applications are written like this)
  • Doing a backup can take much longer. Depending on your needs you may need to lock your tables for the time of the backup, so you may want to keep your backup time low
  • Restoring will also take much more time.
  • If you run out of space, you have to think of some way (maybe moving the whole database to a new server) to solve this problem. Storing the files on the file system you can always mount another hard drive and set soft links.
  • Simply looking into a file for debugging or other information is not as easy. This also includes scripts which might not have access to the database but need some information from various files.

Of course you also get some benefits:

  • Backing up data and file menas they are in sync
  • Removing the file without the database knowing is not possible
  • You don't have to read the file from disk but can do it in one sql statement
  • You can download the database, include the dump into your development environment and have all dependencies right there

Personally I don't do it as I find the cons much heavier than the pros. But as stated above it totally depends on your use case and such.

Sgoettschkes
  • 141
  • 3
1

Some Enterpirse Content Management Systems, like SiteCore, are using one database to store page data and another database to store files. They're using MS SQL Server.

šljaker
  • 476
  • 3
  • 10
  • how does this answer the question asked? – gnat Jul 28 '13 at 08:42
  • If you do a bit of research, you'll find out that SiteCore is one of the most popular enterprise content management systems. SiteCore supports large number of concurrent user, and scales pretty well, so yes, storing files inside a separate database is not a bad practice if you do it right. – šljaker Jul 28 '13 at 20:52
1

For practical implementation, here are what you may concern:

Benifits:

  1. All file contents are definitely synchronized with your table. As comments above said, backing up data is totally convenient as you don't need to keep data synchronized with the file system.
  2. From coding, you can get file content directly from a SQL select.
  3. From a query, you can even filter file content or its size explicitly from SQL statement.

Downsides:

  1. Compared to a databased of which structure is semantically the same but does not store file content, you database tends to consume radically more memory when doing query.
  2. Auto backup can cause performance problem but not much. Let's imagine your database server is backing up things every 6 hours and those databases you have are storing 10-MB file per record. That scenario is not what you want.
0

This is one of those "Can I use a razor blade to cut open an apple?" Yes you can.

Should you? Who is to tell...

I guess when you find yourself in the situation when yor tooling allows it or is the only tooling available (at least to get that deadline). For example I have used a flat screwdriver to remove a philips bolt type... was that correct? was that the right tool? Was it a bad choice?

The answer to this case is: the DB is not supposed to store files... whenever you are doing it is wrong, the same way theoretically I shouldn't have used the flat screwdriver to remove philips bolt, because while doing so I lost the benefit of the philips screwdriver not getting out of place and damaging whatever I am working on (as you would loose benefits if you use a DB as file storage solution)... yet I properlly calculated the risk and everything went just fine.

if you would like to do things right you should probably use git to store your files and just keep the necesary git hashes in your DB to refer to the correct file version you need... the same way I was able to walk to my tool box and get the damn philips screwdriver...

Ordiel
  • 189
  • 7