87

Logging is something that is necessary but is (relatively) rarely used. As such it can be made much more compact in terms of storage.

For example the data most commonly logged like ip, date, time and other data that can be represented as an integer is being stored as text.

If logging was stored as binary data, a lot of space could be preserved thus requiring less rotation and increasing disk lifespan, especially with SSDs where writes are limited.

Some may say that it is such a minor issue that it does not really matter, but taking in consideration the effort needed to build such mechanism it makes no sense not to. Anyone can make this for like two days in his spare time, why don't people do this?

php_nub_qq
  • 2,204
  • 4
  • 17
  • 25
  • 20
    I would challenge your assertion that people *don't* do this. Many do. Some don't, sure, but plenty do. – Servy Oct 04 '16 at 15:05
  • 2
    @Servy I apologize for my ignorance, I am thinking of web-servers and access logs in particular, probably should mention that in the question. – php_nub_qq Oct 04 '16 at 15:07
  • 55
    ["If your machine runs so close to it's limits that such issues really would matter, you most likely have more serious problems..."](http://programmers.stackexchange.com/a/170524/31260) – gnat Oct 04 '16 at 15:07
  • 4
    Look at some random logs and try to spot a pattern that you could structure in binary. Rarely do logs have those, they are mostly whatever text some dev thought of in a moment – PlasmaHH Oct 04 '16 at 16:28
  • 45
    > If logging was stored as binary data, a lot of space could be preserved Well, old logs are typically compressed. – leonbloy Oct 04 '16 at 17:10
  • 5
    I don't think disk space efficiency has been a serious concern for close to a decade now. You can get a 3TB drive for under US$100, and log files rarely come up when I look for the biggest files on my system. I just don't think that even minimal effort is justified on this particular task, when you consider the loss of using tools like `grep` and `find` and standard text editors. – TMN Oct 04 '16 at 17:34
  • 1
    As a first level of saving actual space on a drive, most operating systems have a way to make a drive, directory, or file compressed at the OS/file system level. While such compression may, or may not, be as efficient as using a binary format for the log files, it certainly reduces the space needed by raw text log files. – Makyen Oct 04 '16 at 17:37
  • 90
    Reading a text log on a machine that's halfway broken might be a huge advantage over needing a binary to analyze it. – tofro Oct 04 '16 at 18:30
  • 5
    @TMN disk-space efficiency can still be a concern. 3TB of enterprise storage, with redundancy, off-site disaster recovery, performance test environments etc can easily mean 20 extra actual (expensive) disks. Or, if you are trying to jam a clustered app into a set of lower cost VMs, then again, space starts to matter. Otherwise, I completely agree, this isn't where the savings come: if you're logging that much and gzip can't help, you're doing it wrong. :-) – SusanW Oct 04 '16 at 18:39
  • 5
    I log in JSON. This gives me structured data, as well as a sane way to read the data on the terminal. My logs go into Elasticsearch by way of Logstash. I keep only a very short time of logs on local disk, and everything else just goes into centralized Elasticsearch. This saves host disk space but keeps everything manageable across the fleet. That way I can fix application level stuff across the board, while isolating individual host issues. If something is wonky with a host, I just nuke it and provision a new one these days. – Brad Oct 04 '16 at 19:01
  • 7
    Text files can easily be compressed, if desired. So you can get all the benefits without having to change the tooling used to create or process the logs. – David Schwartz Oct 04 '16 at 19:06
  • 25
    *After months of modifications to get the algorithm executed on the large cluster properly, we still couldn't see much of a performance gain, but when we changed to storing the log files in binary files? Holy cow, we never dared to dream that the performance could be at that level.* How plausible is that kind of story? – null Oct 04 '16 at 19:13
  • 4
    For the same reason JSON is much more popular than protobuf - there are huge advantages of every stage of development to using a human-readable format. – BlueRaja - Danny Pflughoeft Oct 04 '16 at 19:17
  • @SusanW Really you are storing 3TB of enterprise storage, with redundancy, off-site disaster recovery, performance test environments etc can easily mean 20 extra actual (expensive) disks. A log file is to go back and fix problems. You rarely need to get back more than 30 days. You can peal off data and compress it. – paparazzo Oct 04 '16 at 20:10
  • 3
    @Paparazzi oh, yes, I agree, I was answering a previous point: I'm not saying the 3TB would all be used for logs. But there's a popular misconception that bargain consumer disk prices are the guideline for what it costs to provide a service, when really there's much more to it. – SusanW Oct 04 '16 at 20:15
  • 10
    From *The Pragmatic Programmer*, "Always store knowledge in plain text..." Although I don't think they were necessarily referring to log files, their reasons still apply. – J. Allan Oct 04 '16 at 21:00
  • 9
    Logging is "rarely used"? Maybe that is true for programmers but NOT by SysOps –  Oct 04 '16 at 21:01
  • 1
    For disk space there's `gzip`. (Or `xz`) – user253751 Oct 04 '16 at 22:28
  • @DoritoStyle Yep, programmers too! I've made a note in my answer that we've got two scenarios here that are getting a bit tangled: _programmer/support-oriented application logs_, and _records_ (like WebServer access logs, request/response message recordings). I think the former lends to text and gzip; the latter is more like structured records and binary can be appropriate if performance considerations require it (under duress :-) ). Anyway, on re-reading, I think the question was really about access logs and similar.... – SusanW Oct 04 '16 at 23:40
  • 1
    It's interesting that every answer here appears to be *nix-centric. In the Windows world, [binary logs](http://forensicswiki.org/wiki/Windows_Event_Log_\(EVT\)) have been used by the system for a very long time. Even the newer [XML-based logs](http://forensicswiki.org/wiki/Windows_XML_Event_Log_\(EVTX\)) use Binary XML. – Bob Oct 04 '16 at 23:42
  • 5
    You say "rarely used". II believe you meant "Often written, but rarely read." The commenters seems to have understood it as "rarely written". It would be good to have this cleared up. – Stig Hemmer Oct 05 '16 at 07:43
  • @StigHemmer yeah, I think that's common sense – php_nub_qq Oct 05 '16 at 07:56
  • 1
    **Reminder that comments are not for extended discussions. Please visit our chat room to continue the conversation. Thank you!** – maple_shaft Oct 05 '16 at 11:52
  • 1
    @TMN previous job we had an Oracle server failing regularly for no apparent reason. When I looked into the actual error (rather than just the top level error, something like "unable to execute query") it turned out to be an out of disk space error. Which was caused by the log directory being several hundred gigabyte in size and growing rapidly. Changed the logging policy, problem solved. – jwenting Oct 05 '16 at 18:56
  • pedantry: A text file is a binary file, _all_ information on a computer (...let's leave out some of the oddballs) is stored in binary. Text logfiles are only readable because almost everybody agreed to use the same **encoding**, historically ASCII, although this may be changing to full UTF-8. If you logged something on an old (or not so old) AS/400 or iSeries, it's almost certainly going to output in EBCDIC; have fun reading that on a different box. By doing everything as "text", you remove the need for headers or other descriptor fields. – Clockwork-Muse Oct 06 '16 at 02:36
  • 4
    Because humans read the logs, and `grep` uses a _lot_ less RAM than `elasticsearch`. – Michael Hampton Oct 06 '16 at 04:45
  • @Clockwork-Muse Are you sure that's pedantry? Sounds a bit more like nit-picking to me ...? :-) – SusanW Oct 06 '16 at 14:29
  • Asked nearly same thing way back when : http://stackoverflow.com/questions/5113279/why-are-logs-stored-in-flat-files-rather-than-a-database-sql/5113313#5113313 – MickeyfAgain_BeforeExitOfSO Oct 06 '16 at 22:51
  • 8
    i recently had to do debugging against an old Sony broadcast protocol adapted for ethernet, the devices which perform conversion offered a log, the log was an efficient 8-byte event format. **it was the worst experience i've had debugging software in almost 20 years.** for the love of all that is right and good in the realm of computer programming, on all platforms, in all cases, whenever possible you should **produce a text-based, line-delimited log and move on to more important things**. – Shaun Wilson Oct 07 '16 at 04:35
  • 1
    Ditto what others have said in favor of a clear-text log file. When it's 3AM and you're trying to make sense of the crash, having to jockey the log around seven ways from Sunday just to read it is not simply irritating, it's a major source of confusion and errors. (That said, for a log that consumes a lot of disk space and traffic another form may be merited, plus I have seen well-integrated logs that used encoding schemes but made the data readily available through a resident service, and these were quite satisfactory.) – Daniel R Hicks Oct 08 '16 at 00:55
  • (But of course there's also the point that logging is often an afterthought and thrown together without much planning, and as a result any sort of compact representation that doesn't amount to encryption is difficult to achieve.) – Daniel R Hicks Oct 08 '16 at 00:57
  • 1
    I challenge the assumption that modern SSDs meant for long-term data storage are meaningfully write-cycle-bound. See [Are SSD drives as reliable as mechanical drives (2013)?](https://serverfault.com/q/507521/58408) and to a lesser extent [What is the current state (2016) of SSDs in RAID?](https://serverfault.com/q/776564/58408), both on [sf]. – user Oct 09 '16 at 10:01
  • 1
    I'd be very curious to see how well an arbitrary binary format stands up to standard LZMA text compression in terms of disk usage optimization. My educated guess would be that the two would be relatively indistinguishable. – Luke A. Leber Oct 10 '16 at 02:56

14 Answers14

165

systemd famously stores its log files in binary format. The main issues I have heard with it are:

  1. if the log gets corrupted it's hard to recover as it needs specialist tooling
  2. they are not human readable, so you can't use standard tools such as vi, grep, tail etc to analyse them

The main reason for using a binary format (to my knowledge) was that it was deemed easier for creating indices etc i.e. to treat it more like a database file.

I would argue that the disk space advantage is relatively small (and diminishing) in practice. If you want to store large amounts of logging then zipping rolled logs is really quite efficient.

On balance, the advantages of tooling and familiarity probably would err on the side of text logging in most cases.

Shaun Wilson
  • 103
  • 4
Alex
  • 3,882
  • 1
  • 15
  • 16
  • 3
    Good point. I was immediately thinking of systemd too. The even more important part here is that your application doesn't have to *know* how the log data is stored. It can be provided as a system service. – 5gon12eder Oct 04 '16 at 15:30
  • +1 but the answer could be more universal. Logs are not exclusive to lunix systems. – Tulains Córdova Oct 04 '16 at 15:37
  • 2
    @ Tulains Córdova. Indeed. But then again I use `vi` and `grep` on my Sun, OSX and Windows boxes too. If it works... – Alex Oct 04 '16 at 15:41
  • 97
    "famously", more like "infamously" – whatsisname Oct 04 '16 at 18:18
  • 4
    pf (firewall) also logs in binary, specifically to tcpdump format – Neil McGuigan Oct 04 '16 at 19:39
  • 1
    what are 'rolled logs'? – Hatshepsut Oct 04 '16 at 22:47
  • 3
    @Hatshepsut Rolled logs: the log output writes to one file, say `myapp.log` until midnight, and then moves that file to `myapp.log.1`, and starts writing to a new `myapp.log` file. And the old `myapp.log.1` gets moved to `myapp.log.2`, and so on, they all roll along. Thus, `myapp.log` is always the current one. Or they may switch when a certain size is reached. Maybe they put the date/time in the filename. Many logging frameworks support this sort of thing out of the box. – SusanW Oct 04 '16 at 23:45
  • 13
    @Hatshepsut The term `rotating` is also used from what I am aware. – George D Oct 05 '16 at 00:14
  • 2
    @alroc: Encryption does not make files smaller. Compression does. Yes, both outputs increase entropy by encryption generally retains data length. That is, and encrypted 1MB file is 1MB whereas a compressed 1MB text file is usually 100kB – slebetman Oct 05 '16 at 02:36
  • @slebetman you're correct, I meant to write compression. Seems I had encryption on the brain today. – alroc Oct 05 '16 at 02:58
  • "log rotation" facilitates the use of monitoring tools such as `tail`. it's also preferred to have "line-delimited" log files (meaning you can index each line of the log as a unique log event), this allows the use of text processing tools like `grep`, `sed`, `awk`, etc while this is handy when investigating a problem with bare minimum tools, they also transpose easily through all text processors (editors, email programs, browsers, etc.) in large-scale and complex environments, though, there is often a push to abstract out to log aggregation/index services (elastic+logstash+kibana, splunk, etc) – Shaun Wilson Oct 07 '16 at 04:06
  • I would dispute that logs are rarely used. In my current position I have monitoring dashboards driven by logs permanently in my line of sight – Ant P Oct 07 '16 at 17:23
  • @AntP then your logs are constantly used, you simply layer a visualizer on top of them. in my current position and every position for over 15 years having human-readable logs was, at some point, required to investigate issues without a lot of overhead. no visualizer required. – Shaun Wilson Oct 13 '16 at 19:45
91

Why do most log files use plain text rather than a binary format?

Search for the word "text" in the Unix philosophy Wikipedia article, for example you'll find statements like:

McIlroy, then head of the Bell Labs CSRC (Computing Sciences Research Center), and inventor of the Unix pipe,[9] summarized the Unix philosophy as follows:[10]

This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

Or for example, from Basics of the Unix Philosophy,

Rule of Composition: Design programs to be connected with other programs.

It's hard to avoid programming overcomplicated monoliths if none of your programs can talk to each other.

Unix tradition strongly encourages writing programs that read and write simple, textual, stream-oriented, device-independent formats. Under classic Unix, as many programs as possible are written as simple filters, which take a simple text stream on input and process it into another simple text stream on output.

Despite popular mythology, this practice is favored not because Unix programmers hate graphical user interfaces. It's because if you don't write programs that accept and emit simple text streams, it's much more difficult to hook the programs together.

Text streams are to Unix tools as messages are to objects in an object-oriented setting. The simplicity of the text-stream interface enforces the encapsulation of the tools. More elaborate forms of inter-process communication, such as remote procedure calls, show a tendency to involve programs with each others' internals too much.

Anyone can make this for like two days in his spare time, why don't people do this?

Storing the log file in binary is only the beginning (and trivial). You'd then need to write tools to:

  • Display the whole log file (edit)
  • Display the end of the log, without reading the beginning of it (tail -f)
  • Search for stuff in the file (grep)
  • Filter to only display selected/interesting stuff (using an arbitrarily complicated filter expression)
  • Email the log to someone else who doesn't have your log-file-decoder-software
  • Copy-and-paste a fragment of the log file
  • Read the log file while the program (which creates the log file) is still being developed and debugged
  • Read log files from old versions of the software (which are deployed on customer sites and running).

Obviously software can and does use binary file formats too (e.g. for relational databases) but it's not worthwhile (in a YAGNI sense), usually not worth doing, for log files.

ChrisW
  • 3,387
  • 2
  • 20
  • 27
  • 26
    Don't forget documentation! I wrote a binary message recorder for a system a few years ago, which logged incoming requests for regression/replay. Now, the only way to understand these awful files is to look at the code that read/writes them, and yet other teams use them and ask questions about them. Horrible things. – SusanW Oct 04 '16 at 22:10
  • @SusanW At least you kept the source coed! – gerrit Oct 05 '16 at 18:37
  • 2
    To be fair, storing your log in a SQLite DB combined with basic query tools for reading would provide all those features you mention out of the box. ;) – jpmc26 Oct 06 '16 at 17:43
  • 3
    @jpmc26 Yes you can read the log file as long as you can, somehow, convert it to a text format... – ChrisW Oct 06 '16 at 18:06
  • 1
    as said in other comments: text files could be compressed easily and efficient. But the compression does not need to be in the 'data'. The compression could be done in the file system. so you can use the plain text for all tools and have no wasted disk space. – Bernd Wilke πφ Oct 07 '16 at 06:53
  • 1
    @jpmc26 Almost. It's still a layer of additional complexity. With text logs, I know I can log to any server and just using the terminal I can grep (or do more advanced searches). SQLite is not installed by default, but `grep` and command line tools are *ubiquitous*. (This is also a good reason to know `vi`, by the way). – Andres F. Oct 07 '16 at 15:58
  • @AndresF. Not on Windows. lol – jpmc26 Oct 07 '16 at 16:03
  • @jpmc26 Oh, right. Windows! :P I forget about it because I know very few companies which use Windows servers. – Andres F. Oct 07 '16 at 17:58
  • I think this answer is putting a bit too fine of a point on things. The Unix philosophy is talking about text *streams*, yet I don't think anyone bats an eye at the many programs that write their logs to text *files*, and even incorporate hourly rotation, etc., rather than to text *streams* that could be directly plugged into other programs. And your long list of programs is wrong, exactly because of the Unix philosophy: all you need is one program that reads the binary format and translates it to a text stream, and then you can pipe that into all the Unix-y programs that support text streams. – ruakh Oct 09 '16 at 01:56
  • 1
    I think the main benefit of text logs is the flexibility you gain on adding new log calls in your code. With a binary format, for each new call, with new semantics, new data being logged, you would need new tooling. – Spidey Oct 09 '16 at 13:43
  • `edit`, `grep` and `tail` could be easily covered with a program that outputted the decoded log. ie. `decodelog logname.ext | grep "what I'm looking for"` – J. Allan Oct 10 '16 at 23:30
  • @JefréN. Depending on the encoding format, it might not be possible for the decoder to skip (seek past) the beginning of the file without reading it. – ChrisW Oct 10 '16 at 23:37
  • Thanks for the quick response but I'm not sure if I know how that applies. The idea is that `decodelog` would write the decoded log to `stdout`. That could then be piped into `tail -n ...` or whatever. (By the way, as my comment below the OP's question states, I am under the impression that creating log files in plain-text is the best way to do it. ;) +!) – J. Allan Oct 11 '16 at 00:01
  • 2
    @JefréN. If I run `tail -f` on a multi-gigabyte log file, it skips to the end of the file (using 'seek' without 'read') and then reads-and-displays just the end of the file. It doesn't need to decompress/decode the whole file. – ChrisW Oct 11 '16 at 00:06
  • Hmm... I didn't know that `tail` did that... Regards. – J. Allan Oct 11 '16 at 00:11
49

There are a lot of debatable presumptions here.

Logging has been an integral part of (almost) every job I've had. It is essential if you want any sort of visibility on the health of your applications. I doubt that it is a "fringe" use; most organizations I've been involved with consider logs very important.

Storing logs as binary means you must decode them before you can read them. Text logs have the virtue of simplicity and ease of use. If you're contemplating the binary route, you might as well store logs in a database instead, where you can interrogate them and statistically analyze them.

SSD's are more reliable than HDD's nowadays, and the arguments against lots of writes are largely moot. If you're really worried about it, store your logs on an ordinary HDD.

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
  • 19
    "you might as well store logs in a database, where you can interrogate them and statistically analyze them." At a previous job, we had a custom tool that imports our (text-based) logs into a database for exactly this purpose. – Mason Wheeler Oct 04 '16 at 15:16
  • 5
    I thin what OP meant by _"SSD where writes are limited" is the fact that in SSD have a limited write/erase cycles and writing too much on a sector diminished the service life of the device. She didn't mean that writes are lost. – Tulains Córdova Oct 04 '16 at 15:20
  • 4
    @TulainsCórdova: Yes, I knew what she meant. – Robert Harvey Oct 04 '16 at 15:25
  • 1
    @Robert Harvey, as a former DBA, I have to tell you that text stored in normal VARCHAR and related fields is not compressed. It lives within a block with binary attributes. But if you hexdumped it, you could read the text. – DocSalvager Oct 04 '16 at 21:22
  • 2
    @DocSalvager: I didn't assert otherwise. – Robert Harvey Oct 04 '16 at 21:25
  • Apologies if I misinterpreted " If you're going to go the binary route, you might as well store logs in a database, where you can interrogate them and statistically analyze them." – DocSalvager Oct 04 '16 at 21:30
  • @DocSalvager: I made a slight tweak to my answer. – Robert Harvey Oct 04 '16 at 21:36
  • 1
    @DocSalvager Well, the idea obviously is to not store them as text where possible. Also, there are database compression options as well. But the main problem is still there - most logs have a few flags, a severity level, date... and text. Text compression is the only thing that's really going to have any significant effect. – Luaan Oct 05 '16 at 07:40
  • plain text has the value of always being readable. for all time. on any platform. by all software. i would suggest the review of log aggregation tools such as ELK, Splunk, or Graylog2, if for no other reason than to see a list of the many different log formats and delivery mechanisms in use throughout software engineering today. as a programmer, i can't imagine why I would want to spend any time at all on re-inventing storage formats. i certainly wouldn't use any code which couldn't produce a human readable log, and creating a binary log meant to be read by humans makes very little sense. – Shaun Wilson Oct 07 '16 at 04:22
  • 2
    @TulainsCórdova - the limits of SSD write cycles are generally *very* high these days. Even low-cost consumer grade SSDs have manufacturer warranties on write cycles that run into the high hundreds of times the size of the device, and MTBFs that will cover you for writing thousands of times the capacity of the device. And in a commercial setting you should be using higher end devices that have much larger write cycle limits and should be replacing them on at least a 5 year cycle so unless you're writing > 10% storage capacity per day, I don't think there's anything to worry about. – Jules Oct 07 '16 at 14:18
  • 1
    @Jules Good to know. Tech is evolving faster than we can keep pace. – Tulains Córdova Oct 07 '16 at 14:22
36

Log files are a critical part of any serious application: if the logging in the app is any good, then they let you see which key events have happened and when; what errors have occurred; and general application health that goes beyond whatever monitoring has been designed in. It's common to hear about a problem, check the application's built-in diagnostics (pop open its web console or use a diagnostic tool like JMX), and then resort to checking the log files.

If you use a non-text format, then you are immediately faced with a hurdle: how do you read the binary logs? With the log-reading tool, which isn't on your production servers! Or it is, but oh dear, we've added a new field and this is the old reader. Didn't we test this? Yes, but nobody deployed it here. Meanwhile, your screen is starting to light up with users pinging you.

Or perhaps this isn't your app, but you are doing support and you think you know it's this other system, and WTF? the logs are in a binary format? Ok, start reading wiki pages, and where do you start? Now I've copied them across to my local machine, but - they're corrupted? Have I done some kind of non-binary transfer? Or is the log-reading tool messed up?

In short, text-reading tools are cross-platform and ubiquitous, and logs are often long-lived and sometimes need to be read in a hurry. If you invent a binary format, then you are cut off from a whole world of well-understood and easy-to-use tools. Serious loss of functionality just when you need it.

Most logging environments strike a compromise: keep the current logs readable and present, and compress the older ones. That means you get the benefit of the compression - more so, in fact, because a binary format wouldn't shrink the log messages. At the same time, you can use less and grep and so on.

So, what possible benefits might arise from using binary? A small amount of space efficiency - increasingly unimportant. Fewer (or smaller) writes? Well, maybe - actually, the number of writes will relate to the number of disk-commits, so if log-lines are significantly smaller than the disk blocksize, then an SSD would be assigning new blocks over and over anyway. So, binary is an appropriate choice if:

  • you are writing huge amounts of structured data
  • the logs have to be created particularly quickly
  • you are unlikely to need to analyze them under "support conditions"

but this is sounding less like application logging; these are output files or activity records. Putting them in a file is probably only one step away from writing them to a database.

EDIT

I think there's a general confusion here between "program logs" (as per logging frameworks) vs "records" (as in access logs, login records etc). I suspect the question relates most closely to the latter, and in that case the issue is far less well-defined. It's perfectly acceptable for a message-record or activity log to be in a compact format, especially as it's likely to be well-defined and used for analysis rather than troubleshooting. Tools that do this include tcpdump and the Unix system monitor sar. Program logs on the other hand tend to be much more ad hoc.

SusanW
  • 1,035
  • 10
  • 14
  • 1
    Even [Unix `/var/log/utmp` / wtmp are binary](https://en.wikipedia.org/wiki/Utmp). They record who's currently logged in on which tty (so they don't just grow), but they are a form of logging. (And it's useful to be able to parse them cheaply, since various common commands like `who` do just that.) – Peter Cordes Oct 05 '16 at 03:34
  • 1
    @PeterCordes Very true. Again, well-defined data. structured records. And of course, speed and size at all scales were vital considerations back in those days. – SusanW Oct 05 '16 at 09:15
9

An example of a somewhat binary log is wide-spread: the Windows event log. On the pro side, this allows log messages to be quite wordy (and thus hopefully helpful) at virtually no cost, possibly something like

Warning: The queue of foobars to do has grown by 517 items over the last 90 seconds. If this happens about once per day, there is nothing to worry about. If it happens more often or in rapid succession, you may want to check the amount of RAM available to the foobar application. If it occurs together with event 12345, however, you seem to be using an obsolete database and you better call support at +1-555-12345 in order to prevent data loss.

The main part of this message exists only once as a resource installed with the application. However, if this resource is not installed correctly (for example, because meanwhile a newer version has been installed that no longer supports this obsolete message), all you see in the event log is a standard message that is just fancy wording for

Dunno, something with "517" and "90".

and no longer helpful in any way.

TRiG
  • 1,170
  • 1
  • 11
  • 21
  • 9
    Not to mention that _finding_ something in the Windows event log can be a nightmare. It certainly makes me long for a simple text file. – Michael Hampton Oct 06 '16 at 04:55
  • 4
    Wait. Did you want to see *two* (or more) log entries simultaneously? Well too bad. – Eric Towers Oct 07 '16 at 08:52
  • 3
    My answer was going to be "Windows event logs, enough said." – Craig Tullis Oct 08 '16 at 05:43
  • My experience of missing resources for the Event Viewer has been with tools that don't _have_ resources to install, but in that case, AFAIR, there's still a line of actual info from the reporting program, at the bottom, after Windows finishes its 'the resource may be missing or corrupted" spiel. – underscore_d Oct 08 '16 at 12:25
6

TL;DR: Size doesn't really matter, but convenience of use does

First of all, whilst comparing the respective advantages of text and binary formats for short-term log storage is an important question, the size does not really matter. The two reasons for this are:

  1. Logs are highly redundant information that will compress very well: in my experience it is not rare to see compressed log files whose size is 5% or less of the size of the original file. Consequently, using a text or a binary format should not have any measurable impact on the long-time storage of logs.

  2. Whatever format we choose, logs will quickly fill a server disk if we do not implement a “log files sink” that compresses and sends log files to a long-term storage platform. Using a binary format could slow this a bit but even a change by a factor 10 would not matter that much.

Text versus binary log formats

The promise of Unix systems is that, if we learn to use the standard toolset working on text files structured in lines – such as grep, sort, join, sed and awk – we will be able to use them to quickly assemble prototypes performing any job we want, albeit slowly and crudely. Once the prototype has demonstrated its usefulness, we can choose to turn it in a really engineered software to gain performance or add other useful features. This is, at least in my understanding, the essence of the Unix philosophy.

To put it another way, if we likely need to perform treatments and analyses we cannot figure out by today, if we do not know who should implement this analysis, etc. then we are in the stage where prototypes should be used and text formats for logs are probably optimal. If we need to repeatedly perform a small set of well-identified treatments, then we are in the situation where we should engineer a perennial software system to perform this analyse and binary or structured formats for logs, such as relational databases, are likely to be optimal.

(Some time ago, I wrote a blog post about this.)

Toby Speight
  • 550
  • 3
  • 14
Michaël Le Barbier
  • 2,025
  • 14
  • 25
5

The two main questions you would want to ask before choosing between text and binary are:

  • Who is my audience?
  • What content do I need to convey?

A common opinion is that the audience of a log message is a human being. This is obviously not a perfect assumption, because there are plenty of log crawling scripts out there, but it is a common one. In this case, it makes sense to convey the information in a medium which humans are comfortable with. Text has a long standing tradition of being this medium.

As for content, consider that a binary log must have a well defined format. The format must be well defined enough for other people to write software which operates on those logs. Some logs are quite well structured (your question lists several). Other logs need the ability to convey content in a less-well-defined natural language form. Such natural language cases are a poor match for binary formats.

For the logs which could be well described in binary, you have to make a choice. Because text works for everyone, it is often seen as the default choice. If you log your results in text, people can work with your logs. It's been proven thousands of times. Binary files are trickier. As a result, it may be that developers output text simply because everyone knows what that's going to behave like.

Cort Ammon
  • 10,840
  • 3
  • 23
  • 32
4

Log files are in text format because they can be easily read using any type of text editor or by displaying the contents via console command.

However, some log files are in binary format if there is a lot of data. For example, the product I am working on stores a maximum of 15000 records. In order to store the records in the least amount of room, they are stored in binary. However, a special application must be written to view the records or convert them to a format that can be used (e.g. spreadsheets).

In summary, not all log files are in textual format. Textual format has an advantage that custom tools are not needed to view the content. Where there is a lot of data, the file may be in binary format. The binary format will need a (custom) application to read the data and display in a human readable format. More data can be packed into a binary format. Whether to use textual format or binary format is a decision based on the amount of data and ease of viewing the contents.

Thomas Matthews
  • 387
  • 1
  • 8
3

In embedded systems where I might not have an output channel available during run-time, the application can't afford the speed hit imposed by the logging, or logging would alter or mask the effect I'm trying to record, I've often resorted to stuffing binary data into an array or a ring buffer, and either printf()ing it at the end of the test run or dumping it raw and writing an interpreter to print it as readable. Either way, I want to end up with readable data.

In systems with more resources, why invent schemes to optimize what doesn't need optimizing?

JRobert
  • 234
  • 1
  • 3
  • 1
    Similarly, when trying to log in real-time from an embedded device to a PC over a 9,600 baud serial port, it is often advisable to compress data or use a binary format, to prevent overflows. – Mawg says reinstate Monica Oct 06 '16 at 17:34
3

Log files are intended to aid debugging of issues. Typically, hard drive space is much cheaper than engineering time. Log files use text because there are many tools for working with text (such as tail -f). Even HTTP uses plain-text (see also why don't we send binary around instead of text on http).

Additionally, it's cheaper to develop a plain-text logging system and verify that it works, easier to debug if it goes wrong, and easier to recover any useful information in case the system fails and corrupts part of the log.

Casey Kuball
  • 213
  • 1
  • 6
  • 2
    Since it was brought up by someone else, I wanted to point out that HTTP/2 (look out!) allows for binary, bi-directional, multiplexed communications. Any developers who fancy themselves elite should go learn it real quick and then ask themselves why it didn't happen sooner. – Shaun Wilson Oct 07 '16 at 04:28
3

A corrupted text file is still readable around the corrupted part. A corrupted binary file maybe restorable, but it also might not be. Even if it is restorable, it would require quite a bit more work. The other reason is that a binary logging format makes it less likely that during a rush to create a "temporary fix" (aka "the most permanent of all fixes") the logging solution will get used instead of something which can be created quicker.

3

Back in my mainframe days, we used a custom-designed binary log format. The main reason wasn't to save space, it was because we wanted the log to occupy finite space by overwriting old entries with new ones; the last thing we wanted was to be unable to diagnose problems caused by disks becoming full (in 1980 disk space used to cost $1000/Mb, so people didn't buy more than they needed).

Now I still like the idea of a circular log file, and if operating systems offered such a beast I would use it without hesitation. But binary was a bad idea. You really don't want to have to waste time finding the right commands for deciphering a log file when you've got a critical problem to solve.

Michael Kay
  • 3,360
  • 1
  • 15
  • 13
2

We count on unit testing for attaining and maintaining the robustness of our software. (Most of our code runs in a server, headless; post-operation analysis of log files is a key strategy.). Nearly every class in our implementation does some logging. An important part of our unit testing is the use of 'mock' loggers that are used when unit testing. A unit test creates a mock logger and provides it to the item being tested. It then (when useful/appropriate) analyses what got logged (especially errors and warnings). Using a text-based log format makes this much easier for much the same reasons that analyses performed on 'real' logs: there are more tools at your disposal that are quick to use and adapt.

Art Swri
  • 667
  • 5
  • 7
  • 2
    although someone else downvoted, i would like to point out this kind of answer provides value still, it shows that text-based logs can be made useful at even the worst levels of the practice in ways your average programmer doesn't actually care, but should. +1 – Shaun Wilson Oct 07 '16 at 04:30
  • Thanks for the support comment. I try to provide info that I think will be useful to at least some people. It's what I want and expect when I go to SO. – Art Swri Oct 07 '16 at 15:35
2

Historically, Logs were official, hand-written and sequential records of events. When machinery became capable of recording events, these were written to a hard-copy output device such as a teletype printer, which produced a permanent sequential record but which could only process text, and occasionally ring a BELL ...

Chris_F
  • 37
  • 2