Why is filesystem preferred for logs instead of RDBMS?

Question

Question should be clear from its title. For example Apache saves its access and error logs in files instead of RDBMS no matter on how large or small scale it is being utilized.

For RDMS we just have to write SQL queries and it will do the work while for files we must decide a particular format and then write regex or may be parsers to manipulate them. And those might even fail in particular circumstances if great care was not paid.

Yet everyone seems to prefer filesystem for maintaining the logs. I am not biased against any of these methods but I would like to know why it is practiced like this. Is it speed or maintainability or something else?

So how would you log DB errors (db unavailable for example) if your logging system logs to a DB? — Marjan Venema, Jul 12 '11 at 12:13
Quite true, but if that fails, chances are your DB is inaccessible as well... After all, where / how would it write to its tables without the file system? — Marjan Venema, Jul 12 '11 at 12:30
@Yasir: Send all log messages to a syslog server before logging to the filesystem :) — Engineer2021, Jul 12 '11 at 12:35
@MarjanVenema the what if game is pointless. What if the local disk is full, your logging will fail but app and os can keep going. If you're logging to a remote DB server though you'll still be able to log. There are pros and cons to either store for log messages, and which is best depends on what you're trying to get out of logging. Sorry, I'll let the herd get back to file log is the one true way. — Andy, Jul 25 '15 at 14:51
@Andy out-of-disk space is a _normal_ operating condition, not an exception (even though you may be made aware of it through a run time exception). That is not what I consider a file system failure. A file system failure is when the file system fails to operate at all, regardless of the amount of available disk space. — Marjan Venema, Jul 26 '15 at 08:11
@MarjanVenema Out of disk space is a normal condition? Ok, whatever. Its amazing the amount of rationalization that people will go through so they can cling to their position. — Andy, Jul 26 '15 at 15:17
@Andy yes, whether you like it or not, error recovery is part and parcel of developing applications. Out of disk space and programs running out of memory may not occur as frequently as they once used to do, but that doesn't mean they are in the realm of "acceptable to crash over". Rationalization doesn't come into it. — Marjan Venema, Jul 26 '15 at 20:19
@MarjanVenema What's acceptable for a program to do depends on the requirements, but what I see here is endless pushing of dogma (not just in this question, but all over PSE). And no need for straw man arguments, nobody said error recovery is unimportant. FYI, I don't care what you think is mature, and in my world running out of disk space is something that needs to be addressed by either freeing or adding more storage, not by ignoring it and pretending everything is fine. — Andy, Jul 26 '15 at 20:31
@Andy, Read back my original comments again and then tell me where I am applying dogma. As far as I can tell, I am asking questions. You on the other hand are the one making assumptions and statements based on those assumptions. The only relevant point you made was that the file system is moot when you are logging to a remote db. My counter point would have been that when you are logging remote, the manner in which that log service stores the log messages should not be a concern of the sender It should just call some API. And then for that "logger" machine we are back to the original issue. — Marjan Venema, Jul 27 '15 at 08:34
@Andy and where did I say that it was ok to pretend everything is fine? There is a difference between continuing as best you can with restrictions in place, exiting gracefully, and crashing. Boy, do you assume a lot. — Marjan Venema, Jul 27 '15 at 08:37

score 40 · Accepted Answer · edited Sep 18 '17 at 16:44

40

Too many things can fail with the database and logging these failures is important too.
Unless you have a database system allowing autonomous transactions (or no transactions at all), logging would require a separate connection so a rollback or commit in logging doesn't interfere with rollback or commit in the application.
Many things worth logging happen during startup, i.e. possibly before the database connection has been established.
In what could be a typical setup, a new logfile is created every day, old log files are compressed and kept for 2 weeks, before eventually being deleted. It's not easy to do the same in an RDBMS.

edited Sep 18 '17 at 16:44

Robert Harvey

198,589
55
464
673

answered Jul 12 '11 at 12:21

user281377

28,352
5
75
130

2

I tried this experiment and it did not go well. RDBMS is designed around the idea that data is written relatively infrequently relative to the number of times it is read. Logging is basically the opposite. You write all the time and read rarely. This is a great way to annoy your DBA. – JimmyJames Aug 16 '17 at 14:31
1

One might consider using a time-series database system like InfluxDB to keep logs, though; it seems to me that it is a bit better suited for the task than, for example, PostgreSQL. Still, the advantage over old-fashioned logfiles is hardly there. – user281377 Aug 17 '17 at 11:35
Using a non-relational DB with token indexing etc. is definitely useful and if you pick wisely, they can handle the fire-hose. This is part of how things like splunk and flume work. – JimmyJames Aug 17 '17 at 14:11
#4 isn't really a problem. `DELETE FROM dbo.Log WHERE LogDate < today minus 2 weeks` – Robert Harvey Nov 13 '18 at 16:56
@RobertHarvey This works well until you try it in a heavy load environment, where such bulk operations can cause serious problems without extra precautions. Redo logs filling your disk space, undo tablespace becoming too full, replication becoming very busy with replicating the delete etc. – user281377 Nov 14 '18 at 17:25

score 16 · Answer 2 · answered Jul 12 '11 at 12:10

16

I've seen logs written to the DB before (and sometimes you get configurable options for logging, where trace goes to file, errors to DB, fatals to Windows Event log).

The main reasons are speed and size, enabling some tracing can produce vast, vast qualtities of logging - I've trawled through log files gigabytes in size. The other main reason is that reading the logs needs to be sequential, there's no real need to query the log, except to find a certain error or entry - and find-in-file works perfectly well for that.

answered Jul 12 '11 at 12:10

gbjbaanb

48,354
6
102
172

But I have a confusion for this. My notepad, wordpad, gedit or notepad++ or any web-browser won't be happy opening a file 4GB in size. The same browser, however, will be able to show me a list of thousand pages, each containing 500 records printed. Right? – Yasir Jul 12 '11 at 12:14
7

@Yasir because you are using editors that try to load the whole file in memory. Try to use a smarter editor that is able to 'stream' the big file. Vim is a good example. – nakhli Jul 12 '11 at 12:21
6

@Yasir: This is true, but you are trying to optimize the wrong thing. The vast majority of the time, logs are written and never read. So you make the creation of logs very fast because it is the common case. – unholysampler Jul 12 '11 at 12:45
6

Eh, I've done logging to database before and being able to easily query the log messages was immensely beneficial, especially when we turn on debug level logging to track down a hard to replicate bug. – Andy Jul 25 '15 at 00:13
@Andy its overrated, when you have much logging, notepad++ has some great search capabilities that is quicker and easier to use than DB queries. See the search 'mark lines' option with bookmarks, then cut and paste all bookmarked lines into a new tab, and of course a log file lets you highlight repeated bits of text. – gbjbaanb Jul 25 '15 at 11:09
2

@gbjbaanb I didn't find it overrated, and frankly you suggesting using mark lines and cut and paste to query is a joke. Its not just searching, we analyzed trends to find servers that had more problems than others, what kind of errors users were seeing most often, etc. – Andy Jul 25 '15 at 13:12

score 16 · Answer 3 · answered Jul 12 '11 at 12:25

Speed is one reason; others are:

Eliminating points of failure. A filesystem rarely fails under conditions where a DBMS wouldn't, but there are lots and lots of error conditions in databases that simple don't exist in filesystems.
Low-tech accessibility. If things go really really bad, you can boot into a rescue shell, or mount the disk on a different system, and still have adequate tools available to inspect log files. If it's a database, you're nowhere without a database server running.

score 3 · Answer 4 · edited Jul 01 '13 at 12:36

3

First off.

And those might even fail in particular circumstances if great care was not paid.

Database transactions can't fail when you are not careful?

Writing to a text file has a number of benefits, the most important being

Text is human readable. Anyone can open up a log file with an basic text editor and see what the messages are. You don't need to understand how the database is organized.
Speed. Writing text to disc is much faster that a database service figuring out where the text goes in a database, writing it there, and ensuring the transaction completed.

edited Jul 01 '13 at 12:36

user

2,703
18
25

answered Jul 12 '11 at 12:13

unholysampler

7,813
1
30
38

Obviously any and everything may fail if we are not careful. But for this question I was refering to high level programmer. As a simple example, the programmer might want to separate values using a particular character. So his/her regex will work like a charm but will fail when the same character is contained inside a value block. This way he needs to take care of similar possible cases and he doesn't need to think about them if he were saving in DB. Also, can you please see my comment on gbjbaanb's answer? – Yasir Jul 12 '11 at 12:21
1

And if you are hand writing your SQL, you have the same problem. The difference being the write will fail (or corrupt your data) instead of annoying some developer slightly because his search string brought up some bad results. Yes, there are frameworks that mean you don't have to write SQL, but every extra layer slows down the process. And remember this is just logging. Every cycle you use to log is a cycle you are not using to do real work. – unholysampler Jul 12 '11 at 12:51
@unholysampler Your performance argument is weak, logging can be done very fast and on a background thread to a database, and logging to the f's while potentially faster is still not free either, especially if its not done in the background. – Andy Jul 25 '15 at 14:44

score 2 · Answer 5 · answered Jul 25 '15 at 10:02

You raise Apache specifically, so I will discuss this in detail.

Apache can be configured to log to a database, although it requires an external plugin to do so. Using such a plugin can make log analysis easier, but only if you intend to write your own log analysis software. Standard off-the-shelf log analysers assume your logs are in files, so you won't be able to use these.

When I was doing this, I also experienced reliability issues: if the database server's write buffer filled up (which can happen with mysql if you use up your file system quota for the user it runs under) it starts queuing up queries until they are able to proceed, at which point Apache starts waiting for it to finish, resulting in hung requests to your web site.

(This issue may now be fixed, of course - it was many years ago that I did this)

score 1 · Answer 6 · answered Aug 16 '17 at 14:50

A filesystem is a database. It's indeed a simpler, hierarchical database instead of an relational DBMS, but it's a database nevertheless.

The reason why logging to a filesystem is popular is because text logs fits well with Unix philosophy: "Text is the universal interface."

Unix had developed with lots of general purpose tools that can work well with text logs. It doesn't matter whether the text logs are produced by mysql, apache, your custom application, third party software that's long out of support, the sysadmin can use standard Unix tools like grep, sed, awk, sort, uniq, cut, tail, etc, to trawl through the logs all the same.

If every app logs to its own database, one to MySQL, another to Postgres, another to Elasticsearch, another wants to log to ELK, another can only log to MongoDB, then you would have to learn twenty different tools to trawl the logs of each application. Text is a universal medium that everyone can log to.

Even when you manage to make it so that all logs goes to a single database, say MySQL, you may find that each application would want to log with different table schemas, so you still would have to write customized tool to query the logs for each application. And if you somehow crammed every applications to log to a single schema, you'll likely find that that generic schema couldn't really tell you the full story of each application, so you still have to parse the log texts anyway.

Logging to a database often don't really make things significantly easier in practice.

Logging to a database can be useful when you have a specific analysis that you have in mind, or for specific audit retainment requirement, for which you can design a specific database schema to collect just the data for those specific purposes. But for forensic and debugging and when you collect log without specific objective in mind, text logs are usually good enough that the cost of learning or creating the specialized tools often aren't worth it.

score 0 · Answer 7 · answered Jan 09 '17 at 07:43

Let's look at this on a few layers:

Machine layer
Operating system layer
Service layer
Application layer

In brief:

On the machine layer, you really cannot do logging other than some sort of dumps.
On the OS layer you can do logging but you really only have the file system available.
Services can log to file system, but they cannot trust other services to be running so they cannot log there.
Applications can log to services and the file system.

Then we have the use-case based approach:

Do you want to log node-specific errors to a horizontally scaled RDBMS where you need to take the extra work to find the error of a specific node when you could just pop open the hood for the one node and see it there? On the other hand, your application possibly should log to an RDBMS to gather application-level errors and notices.

What happens when the RDBMS needs to do logging for itself because the database cannot be written into?

score -2 · Answer 8 · answered Jul 24 '15 at 19:59

-2

Complexity. Adding RDBMS will increase complexity of whole system astronomically. And ability to manage complexity is the main thing which distinguishes programmers from source code producers.

answered Jul 24 '15 at 19:59

noonex

105
1

1

Could you expand on what you mean about complexity as it relates to logging to a DB versus a file system? From my experience, there hasn't been a significant difference in complexity in a business environment. – Adam Zuckerman Jul 24 '15 at 20:45
Really? SqlLite increases the complexity astronomically? And while a web server normally wouldn't need a DB, many LOB apps are already using one, so there's no additional cost there at all. – Andy Jul 25 '15 at 14:47
@AdamZuckerman of course any RDBMS requires maintenance, prone to corruption, may need special tuning, may be affected by bad configuration, may need special recovery, brings own limitations, has own dependencies, supported platforms, upgrade issues, bugs, licensing and so on. – noonex Jul 25 '15 at 21:29
@Andy first of all, SQLite is not RDBMS in classic seance - it is "embedded RDBMS". And yes - requiring SQLite for logging will increase complexity a lot. – noonex Jul 25 '15 at 21:38
1

@noonex You're just arbitrary making a distinction between embedded vs full server, when RDBMS doesn't. SqlLite provides ACID compliance, which is really what RDBMS are about. And it increases complexity a lot? I can only imagine you haven't worked on anything but the most trivial of applications. Finally, good job completely ignoring my point about many LOB applications already needed a database anyway. – Andy Jul 26 '15 at 15:04
"of course any RDBMS requires ....". So do OSes and the file systems they provide. Nothing in that list is specific to an RDBMS, those are general truths about the platform you app runs on too. – Andy Jul 26 '15 at 15:06
lol @Andy - do you plan to use RDBMS without "OSes and the file systems". The key point is that if you have one system of complexity N which uses another system of complexity M - in result you may get N*M complexity. Good luck troubleshooting RDBMS issues caused by "OSes and the file systems" bugs. If you don't consider that any RDBMS will deal with OS and FS issues anyway - I don't think I should reply further comments of the same level – noonex Jul 26 '15 at 16:34
@noonex and you may not get m*n complexity. You're arguing on theory and worse case, which often is nit realistic. And in your standard operating view, where its normal to have no free disk space, file logging fails too. I'm not saying logging to a DB is always what should be done but there are times when it makes more sense. It depends. A database is no less reliable than a file system, that's kinda the point of a database is not to lose data. In any event, if any of those systems fail its usually a hardware issue. If the disk is bad DB vs fs is irrelevant. – Andy Jul 26 '15 at 17:24

score -5 · Answer 9 · answered Jul 12 '11 at 12:10

-5

Is it speed or maintainability or something else?

Speed.

answered Jul 12 '11 at 12:10

S.Lott

45,264
6
90
154

Why is filesystem preferred for logs instead of RDBMS?

9 Answers9

Linked