85

I've always used JSON files for configuration of my applications. I started using them from when I coded a lot of Java, and now I'm working mainly on server-side and data science Python development and am not sure if JSON is the right way to go any more.

I've seen Celery use actual Python files for configuration. Initially I was skeptical about it. But the idea of using simple Python data structures for configuration is starting to grow on me. Some pros:

  • The data structures will be the same as I'm normally coding in. So, I don't need to change frame of mind.
  • My IDE (PyCharm) understands the connection between configuration and code. Ctrl + B makes it possible to jump between configuration and code easily.
  • I don't need to work with IMO unnecessary strict JSON. I'm looking at you double quotes, no trailing commas and no comments.
  • I can write testing configurations in the application I'm working on, then easily port them to a configuration file without having to do any conversion and JSON parsing.
  • It is possible to do very simple scripting in the configuration file if really necessary. (Although this should be very, very limited.)

So, my question is: If I switch, how am I shooting myself in the foot?

No unskilled end user will be using the configuration files. Any changes to the configuration files are currently committed to Git and are rolled out to our servers as part of continuous deployment. There are no manual configuration changes, unless there is an emergency or it is in development.

(I've considered YAML, but something about it irks me. So, for now it is off the table.)

  • 41
    Unskilled isn't your problem. Malicious is. – Blrfl Jun 18 '17 at 11:42
  • 3
    @Blrfl True. That's one of my main concerns. Currently, if you can change configurations files you can also change code. The problem I'm seeing is that we in the future have more separation of duties where configuration files get passed between team members, some who are allowed to do coding updates, others who aren't. That's far off, but could be a thing in the future. Python based configuration files will then be a security issue. – André Christoffer Andersen Jun 18 '17 at 11:47
  • There is also Django, a python web-framework, that use python configuration file.It permit to reuse some variable for example a lot of project have the value "ROOT" for the project path and use it for over variable with os.path.join(ROOT, path). – Pierre.Sassoulas Jun 18 '17 at 20:12
  • 1
    Use JSON as your configuration. Many languages understand JSON (including Python, so one might argue it is using a subset of Python), it is brief and readable. And you won't be tempted to try to write more complex 'code' in your configuration files. – ChuckCottrill Jun 19 '17 at 02:15
  • 2
    The fundamental issue is that python is Turing-complete and JSON isn't. Thus no program can reliably predict the behavior of a python script. If your IDE seems to understand python, it isn't understanding it in the same sense that software could understand a JSON file. Birfl says: *Unskilled isn't your problem. Malicious is.* To amplify on this, there are easy automated ways to protect against malicious JSON, but there is no way, even in principle, to give automated protection against malicious python. It sounds like what you really want is a more expressive Turing-incomplete language. – Ben Crowell Jun 19 '17 at 03:10
  • 7
    If you dislike JSON, you should try yaml. I like it for configs a lot. especially when larger strings are involved, YAML is waaaay more readable then JSON. – Christian Sauer Jun 19 '17 at 07:47
  • 1
    Use JSON or YAML or similar. If people want the ability to use macros and such in the config, they can achieve that by writing a script to generate the config. – Ben Jun 19 '17 at 10:21
  • Related: https://stackoverflow.com/questions/40930663/config-file-with-a-py-file – ivan_pozdeev Jun 19 '17 at 14:51
  • There is a contradiction between "unskilled users" and "scripting". "Unskilled" users should be given the simplest and most bullet-proof format, like JSON or INI, with all advanced interpretation explicitly limited and controlled by your program. Advanced users could just write a Python module for your program (especially is you allow for extension scripts), _without conflating it_ with config files. – 9000 Jun 19 '17 at 14:52
  • 1
    Not enough rep (or practical experience with my suggestion) to answer, but you might be interested in something like [Dhall](https://github.com/Gabriel439/Haskell-Dhall-Library), which is a Turing-*incomplete* language intended for configuration files. (At this stage in its development, it would have to form part of a two-step process: create and verify your configuration file in Dhall, then convert it to JSON or YAML.) – chepner Jun 19 '17 at 17:33
  • How does your program react if you put `rm -rf /*` in the Python configuration script? :-) – Dominique Jun 19 '17 at 14:35
  • @Blrfl Surely the same concerns apply to any program the user has rights to modify, whether or not its configuration is executable. – Casey Jun 19 '17 at 19:10
  • 1
    If you avoid your last bullet point - adding scripting - then all you're using Python for is "JSON, but more user-friendly". If that's the case, just use ast.literal-eval, and the python interpreter will enforce that the file only has data, with no executable code. I use it in this way, and it's pretty nice. The only problem is if you do make a syntax error, ast.literal_eval won't properly tell you where it is. Hunting it down gets ugly. Note, I don't even save these as .py files - I label them .txt to make the fact that it's non-executable more obvious. – Scott Mermelstein Jun 19 '17 at 20:04
  • If you don't like JSON, YAML nor XML, you should check out [edn](https://github.com/edn-format/edn). It's similar to JSON, except less verbose (no commas (actually commas are whitespace, so you can put them anywhere)), has richer types (a keyword type with optional namespaces, sets, datetimes), and is extensible with custom types. There's a [parser for python](https://github.com/swaroopch/edn_format). – madstap Jun 20 '17 at 01:35
  • 1
    @AndréChristofferAndersen: One option -- which I've never needed to try -- is to use Python syntax, but restrict it in some way, and interpret the AST yourself instead of evaluating the code. That way you can restrict what the user can and can't do, while still providing more flexibility than, say, JSON. – user541686 Jun 20 '17 at 03:55
  • Alternatively, there's TOML, which is used in Cargo (Rust) crate configurations, and has a parser available through `pip install toml`. – Jules Jun 20 '17 at 13:40
  • Obligatory side mention that JSON doesn't *have* to be strict. A variety of JSON libraries do support trailing commas and JS-style comments. It won't be compatible for external use, but for a config file, it's great. There's also a variety of other JSON-like config languages, like [TypeSafe's config](https://github.com/typesafehub/config). That's a superset of JSON with a variety of extra features. – Kat Jun 20 '17 at 18:39
  • It's the precise opposite of a bad idea – Miles Rout Jun 20 '17 at 22:50
  • This is a very common pattern in Python frameworks and applications. Celery, Django, setuptools, rope (refactoring library), pgAdmin 4 (PostgreSQL GUI), ipython (python shell), qtile (window manager), are just some that comes to mind. There are gotchas of course, IMO setuptools for example should never have used `setup.py`, but for many applications especially one targeted towards programmers or power users, and one where security/code execution isn't really a issue, it's perfectly fine. Outside of Python, there are programs like Vim and Emacs that uses Turing complete configuration language. – Lie Ryan Oct 30 '20 at 10:31

7 Answers7

102

Using a scripting language in place of a config file looks great at first glance: you have the full power of that language available and can simply eval() or import it. In practice, there are a few gotchas:

  • it is a programming language, which needs to be learnt. To edit the config, you need to know this language sufficiently well. Configuration files typically have a simpler format that is more difficult to get wrong.

  • it is a programming language, which means that the config can get difficult to debug. With a normal config file you look at it and see what values are provided for each property. With a script, you potentially need to execute it first to see the values.

  • it is a programming language, which makes it difficult to maintain a clear separation between the configuration and the actual program. Sometimes you do want this kind of extensibility, but at that point you are probably rather looking for a real plugin system.

  • it is a programming language, which means that the config can do anything that the programming language can do. So either you are using a sandbox solution which negates much of the flexibility of the language, or you are placing high trust in the config author.

So using a script for configuration is likely OK if the audience of your tool is developers, e.g. Sphinx config or the setup.py in Python projects. Other programs with executable configuration are shells like Bash, and editors like Vim.

Using a programming language for configuration is necessary if the config contains many conditional sections, or if it provides callbacks/plugins. Using a script directly instead of eval()-ing some config field tends to be more debuggable (think of the stack traces and line numbers!).

Directly using a programming language may also be a good idea if your config is so repetitive that you are writing scripts to autogenerate the config. But perhaps a better data model for the config could remove the need for such explicit configuration? For example, it may be helpful if the config file can contain placeholders that you later expand. Another feature sometimes seen is multiple config files with different precedence that can override each other, though that introduces some problems of its own.

In the majority of cases, INI files, Java property files, or YAML documents are much better suited for configuration. For complex data models, XML may also be applicable. As you've noted, JSON has some aspects that make it unsuitable as a human-editable configuration file, although it is a fine data exchange format.

amon
  • 132,749
  • 27
  • 279
  • 375
  • 1
    Thanks! Very informative. I'll give it more thought. I have considered INI files, and python seems to have a lot of nice features/libraries to handle them. – André Christoffer Andersen Jun 18 '17 at 12:40
  • 26
    There are a couple of configuration file formats which are "accidentally Turing-complete", most famously `sendmail.cf`. That would indicate that using an *actual* scripting language might be beneficial, since that one was actually *designed* to be Turing-complete. **However**, Turing-completeness and "Tetris-completeness" are two different things, and while `sendmail.cf` can compute arbitrary functions, it can *not* send your `/etc/passwd` over the net or format your harddisk, which Python or Perl would be able to. – Jörg W Mittag Jun 18 '17 at 14:25
  • 3
    @JörgWMittag Turin-completness doesn't imply to be able to send things over the network or access the hard disk. That is, Turin-completness is about processing not about I/O. For example CSS is considered Turin complete, but it won't mess with your permanent storage. You have said elsewhere that "Idris is a total pure functional language, so it is by definition not Turing-complete", that doesn't follow, and apparently it is Turin complete. I was convinced your use of Testris-complete meant that the language was Turin complete but not able to do full I/O... it seems that is not what you mean. – Theraot Jun 19 '17 at 00:12
  • 6
    @Theraot: "Total" means it always returns. A Turing Machine can perform an infinite loop, i.e. it has the ability to not return. Ergo, Idris cannot do everything a Turing Machine does, which means it is *not* Turing-complete. This is true of all dependently-typed languages. The whole point of a dependently-typed language is that you can decide arbitrary properties about programs, whereas in a Turing-complete language you cannot even decide trivial properties such as "does this program halt?" Total languages are *by definition* not Turing-complete, because Turing Machines are partial. – Jörg W Mittag Jun 19 '17 at 00:17
  • 10
    The *definition* of "Turing-complete" is "can implement a Turing Machine". The definition of "Tetris-complete" is "can implement Tetris". The whole point of this definition is that Turing-completeness is simply not very interesting in the real world. There are plenty of useful languages that aren't Turing-complete, e.g. HTML, SQL (pre-1999), various DSLs, etc. OTOH, Turing-completeness only implies that you can compute functions over natural numbers, it doesn't imply printing to the screen, accessing the network, interacting with the user, the OS, the environment, all of which are important. – Jörg W Mittag Jun 19 '17 at 00:21
  • 4
    The reason why Edwin Brady used this example is because many people think that languages that aren't Turing-complete cannot be used for general purpose programming. I myself used to think that, too, since after all, many interesting programs are essentially endless loops that we *don't want to stop*, e.g. servers, operating systems, event loops in a GUI, game loops. Lots of programs process infinite data, e.g. event streams. I used to think that you cannot write that in a total language, but I since learned that you *can*, and so I find it a good idea to have a term for that capability. – Jörg W Mittag Jun 19 '17 at 00:24
  • @JörgWMittag I see, I didn't know that "total" meant that on the context. I'll assume you are right about Idris. Although, the Turin complete moniker has been granted to languages that require user input to proceede... back to CSS, it can't do loops, but it can ecode the rules, if only the user keeps clicking (as a sort of clock input). I agree it isn't very useful, yet what I'm coming from is that a language with no side effects can be turin complete. Tetris might not be a great alternative, as you don't need to "send your /etc/passwd over the net or format your harddisk" to get Tetris. – Theraot Jun 19 '17 at 00:29
  • 1
    @JörgWMittag: it's not strictly true that all dependently-typed languages are total, it's just very common, since having type-checking fail to terminate is considered bad form. (I consider "dependently-typed" and "total" similar to "lazy" and "pure" in that there's no theoretical reason they need to go together, it's just very awkward if they don't). – Ben Millwood Jun 19 '17 at 15:19
  • @JörgWMittag Moreover, the idea that in a (total or not) DT language "you can decide arbitrary properties about programs" is false. You can decide whether they halt or not (i.e. they do), but you can't necessarily decide if, say, a function from the natural numbers ever attains a particular value. – Ben Millwood Jun 19 '17 at 15:21
  • "which makes it difficult to maintain a clear separation between the configuration and the actual program" To be fair, some platforms create complicated config files that do this. *grumbles about ADO.NET and ASP.NET providers* – jpmc26 Jun 19 '17 at 23:32
  • The good example of this is Dependency Injection Frameworks. The main thing they give you is construction moved into a different language. Does nothing the language couldn't do. But now mixing construction and behavior is less likely since it happens in two different languages. Configuration and code are often separated by language for the same reason. – candied_orange Jun 20 '17 at 02:34
  • @JörgWMittag: *"The whole point of a dependently-typed language is that you can decide arbitrary properties about programs"* ...uh, dependent typing is undecidable unless you add additional constraints. That's literally in the Wikipedia introduction. – user541686 Jun 20 '17 at 03:58
  • 1
    @Theraot **Turing** (as in the man) _not_ **Turin**. – Boris the Spider Jun 20 '17 at 07:02
  • @JörgWMittag: I'd generalise that a bit. At the point where you observe that your configuration is starting to develop "inner platform" features, then start considering whether to just use a better platform/language. So for example if you use JSON config, then *way* before it's Turing-complete you sometimes find that wherever your config specifies a filename, you want to provide ways to say, in JSON, "call certain functions from `os.path` with these parameters to give the filename". You start *wishing* you'd used Python, even if it's not the right decision (yet) :-) – Steve Jessop Jun 20 '17 at 12:03
  • 1
    A concise way of stating this: **Expressive power is a bug, not a feature.** – R.. GitHub STOP HELPING ICE Jun 20 '17 at 12:48
  • Just to add another example of a project that uses a script for a config file: Django and the `settings.py` file – Matt Dec 21 '22 at 14:45
54

+1 to everything in amon's answer. I'd like to add this:

You'll regret using Python code as your configuration language the first time you want to import the same configuration from within code written in a different language. For example if code that's part of your project and it written in C++ or Ruby or something else needs to pull in the configuration, you'll need to link in the Python interpreter as a library or parse the configuration in a Python coprocess, both of which are awkward, difficult, or high-overhead.

All of the code that imports this configuration today may be written in Python, and you may think this will be true tomorrow as well, but do you know for sure?

You said you would use logic (anything other that static data structures) in your configuration sparingly if at all, which is good, but if there's any bit of that at all, you'll find it difficult in the future to undo it so you can move back to a declarative configuration file.

EDIT for the record: several people have commented on this answer about how likely or unlikely it is that a project would ever be successfully completely rewritten in another language. It's fair to say that a complete backward-compatible rewrite is probably rarely seen. What I actually had in mind was bits and pieces of the same project (and needing access to the same configuration) being written in different languages. For example, serving stack in C++ for speed, batch database cleanup in Python, some shell scripts as glue. So spend a thought for that case too :)

Celada
  • 654
  • 4
  • 6
  • One of the rules of Python programs is they should carry the .py extension. So, yea, you will know it's written in Python. Of-course, the file can simply contain JSON and still work as intended, but that's beside the point here I think. – Mast Jun 18 '17 at 18:58
  • 1
    @Mast, apologies, but I don't follow. The name of the file (whether or not it ends in .py) is neither here nor there. The point I am trying to make is that if it's written in Python then you need a Python interpreter to read it. – Celada Jun 18 '17 at 19:01
  • Either that wasn't clear from your answer or I'm simply parsing it wrong. In any case, why does it matter he needs a Python interpreter? OP indicates he's in a Python environment now, so it's available. It just so happens to be available *by default* on GNU/Linux systems because it's used for the configuration of the OS. So I'm not surprised OP wants to do something similar for his applications. – Mast Jun 18 '17 at 19:08
  • All right, let me try to clarify by editing I guess... – Celada Jun 18 '17 at 19:11
  • 12
    @Mast I think you're parsing it wrong. The point I took from this answer (both original and edited) is that the choice to write configuration files in a programming language is that it makes it harder to write code in another language. E.g. you decide to port your app to Anrdoid / iPhone and will be using another language. You either have to (a) rely on a Python interpreter on the mobile phone app (not ideal), (b) re-write the configuration in a language-independent format and rewrite the Python code that used it, or (c) maintain two configuration formats going forward. – Jon Bentley Jun 18 '17 at 19:37
  • 4
    @JonBentley I suppose the concern will be relevant if there're plans to do multi-language projects. I didn't get that impression from the OP. Additionally, using text files for config still requires additional code (in all languages) for actual parsing/conversion of values. Technically, if they limit the Python side to `key=value` assignments for config, I don't see why a Java/C++ program couldn't read the Python file as a plain text file and parse it the same if they need to move to something else in the future. I don't see a need for a full-fledged Python interpreter. – code_dredd Jun 18 '17 at 19:54
  • @ray That said, if you're going to restrict yourself to `key=value` for that purpose, [python already has configparser](https://docs.python.org/2/library/configparser.html), and [java has ini4j](http://ini4j.sourceforge.net/), etc etc. Or just go json as OP suggested in the first place. – Izkata Jun 18 '17 at 22:38
  • 3
    @ray True, but the answer is still useful on the basis that questions shouldn't just be applicable to the person who posts them. If you use a standardized format (e.g. INI, JSON, YAML, XML, etc.) then you will likely be using an existing parsing library rather than writing your own. That reduces the additional work to just an adapter class to interface with the parsing library. If you are limiting yourself to key=value, then that does away with most of the OP's reasons to use Python in the first place and you might as well just go with a recognized format. – Jon Bentley Jun 18 '17 at 23:33
  • @JonBentley I think those are good points, but libraries often imply additional/external dependencies that may be undesirable or unavailable (e.g. sysadmin too worried that installing a module will require updates to others and might "break production"). Some languages may be more susceptible to this (e.g. Perl) than others. – code_dredd Jun 18 '17 at 23:39
  • @JonBentley I have yet to see any real world project that kept backwards compatibility when it was completely rewritten in another language. Hell I can't think of any successful project that was completely rewritten to begin with (I thought we learned that lesson from Netscape, or Borland.. ok maybe we don't learn that lesson very well). This strikes me as a pretty theoretical worry. And if backcomp isn't a problem (as it would if you were say porting to Android), the fact that you have to port your whole project dwarfs any cost of having to rethink and rewrite your config file system anyhow. – Voo Jun 19 '17 at 13:09
  • This is exactly the argument people had against XML 10 years ago. – Dmitry Grigoryev Jun 19 '17 at 13:17
  • 3
    I had to do this a few years ago when a tool written in Lua used Lua script as its configs, then they wanted us to write a new tool in C#, and specifically asked us to use the Lua config script. They had a total of 2 lines that were actually programmable and not simple x = y, but I still had to learn about open source Lua interpreters for .net because of them. It's not a purely theoretical argument. – Kevin Fee Jun 19 '17 at 15:55
  • If bits are written in other languages, couldn't you define a internal protocol to communicate configuration to those other bits? In all likelihood you'd retain Python (or whatever) as the high-level glue that binds the rest together, in which case one of its responsibilities would be to parse the config and pass that on to other components. – nneonneo Jun 20 '17 at 01:28
  • @nneonneo, sure. There are always workarounds. It's still technical debt. – Celada Jun 21 '17 at 22:17
23

The other answers are already very good, I'll just bring my experience of real-world usage in a few projects.

Pros

They are mostly already spelled out:

  • if you are in a Python program, parsing is a breeze (eval); it works automatically even for more complex data types (in our program, we have geometric points and transformations, which are dumped/loaded just fine through repr/eval);
  • creating a "fake config" with just a few line of code is trivial;
  • you have better structures and, IMO, way better syntax than JSON (jeez even just having comments and not having to put double quotes around dictionary keys is a big readability win).

Cons

  • malicious users can do anything that your main program can do; I don't consider this much of a problem, since generally if a user can modify a configuration file he/she can already do whatever the application can do;
  • if you are no longer in a Python program, now you have a problem. While several of our configuration files remained private to their original application, one in particular came to store information that is used by several different programs, most of which are currently in C++, which now have a hacked-together parser for an ill-defined small subset of Python repr. This is obviously a bad thing.
  • Even if your program remains in Python, you may change Python version. Let's say your application started in Python 2; after lots of testing you managed to migrate it to Python 3 - unfortunately, you didn't really test all of your code - you have all the configuration files lying around on your customers' machines, written for Python 2, and on which you don't really have control. You cannot even provide a "compatibility mode" to read old configuration files (which is often done for file formats), unless you are willing to bundle/call the Python 2 interpreter!
  • Even if you are in Python, modifying the configuration file from code is a real problem, because... well, modifying code is not trivial at all, especially code that has a rich syntax and is not in LISP or similar. One program of ours has a configuration file that is Python, originally written by hand, but which later turned out it would be useful to manipulate via software (a particular setting is a list of things that is way simpler to reorder using a GUI). This is a big problem, because:

    • even just performing a parse→AST→rewrite roundtrip is not trivial (you'll notice that half of the proposed solutions are later marked as "obsolete, do not use, does not work in all cases");
    • even if they worked, AST is way too low-level; you are generally interested in manipulating the result of the computations performed in the file, not the steps that brought to it;
    • which brings us to the simple fact that you cannot just edit the values you are interested with, because they may be generated by some complex computation that you cannot understand/manipulate through your code.

    Compare this with JSON, INI or (God forbid!) XML, where the in-memory representation can always be edited and written back either without loss of data (XML, where most DOM parsers can keep whitespace in text nodes and comment nodes) or at least losing just some formatting (JSON, where the format itself doesn't allow much more than the raw data you are reading).


So, as usual, there's no clear-cut solution; my current policy on the issue is:

  • if the configuration file is:

    • surely for a Python application and private to it - as in, nobody else will ever try to read from it;
    • hand-written;
    • coming from a trusted source;
    • using target application data types is really a premium;

    a Python file may be a valid idea;

  • if instead:

    • there may be the possibility of having some other application read from it;
    • there is the possibility that this file may be edited by an application, possibly even my application itself;
    • is provided by an untrusted source.

    a "data only" format may be a better idea.

Notice that it's not required to make a single choice - I recently wrote an application that uses both approaches. I have an almost-never-modified file with first-setup, handwritten settings where there are advantages of having nice Python bonuses, and a JSON file for configuration edited from the UI.

Matteo Italia
  • 475
  • 2
  • 10
  • 1
    very good point about generating or rewriting config! But few formats other than XML can retain comments in the output, which I consider extremely important for configuration. Other formats sometimes introduce a `note:` field that is ignored for the configuration. – amon Jun 19 '17 at 08:33
  • 2
    "if a user can modify a configuration file he/she can already do whatever the application can do" - this is not quite true. How about testing than shiny config file someone you don't know have uploaded on pastebin? – Dmitry Grigoryev Jun 19 '17 at 13:22
  • 2
    @DmitryGrigoryev: if you are aiming to that target you may as well tell your victim to copy-paste some `curl ... | bash`, it's even less of a hassle. :-P – Matteo Italia Jun 19 '17 at 13:47
  • @DmitryGrigoryev : and it's the type of thing that might allow someone to completely wreck a production system on their first day on the job. If 'eval' is your parser, that means there's no opportunity to check it for problems before it's been read. (the same reason why shell scripts are so bad in production). INI, YAML or JSON are safe in this regard. – Joe Jun 19 '17 at 20:11
  • You don't mention YAML. I've always thought of YAML as the "Python of config files" since whitespace is significant in similar ways. It's FAR easier to modify by hand than JSON. And, you don't allow arbitrary code execution. – Wildcard Jun 20 '17 at 03:16
  • @Wildcard: I didn't mention YAML mostly because (1) I "collated" it along with the other "data only" languages and (2) I don't have much first-hand experience with it. I touched some configuration file here and there, and the basic syntax left me favorably surprised, although when I went down to check out the full grammar it stroke me as a bit too much complicated (and with more corner cases than I felt necessary). Still, it's a way more readable alternative to JSON, so of course it's "on the table". – Matteo Italia Jun 20 '17 at 06:49
  • @MatteoItalia Obviously, `curl ... | bash` has the same problem, it's only suitable for URLs you completely trust. Though I have only seen this construct used for software installation, and if one trusts the software there's usually no reason to mistrust the installation script. Is this what you meant by *aiming to that target*? – Dmitry Grigoryev Jun 20 '17 at 08:41
  • 1
    @DmitryGrigoryev: my point is just that if your victim type is stupid enough to blindly copy-paste a configuration file, you can probably trick him/her into doing whatever on their machine with way less oblique methods ("paste this into a console to fix your problem!"). Also, even with non-executable configuration files there's much potential for harm - even just maliciously pointing logging over critical files (if the application runs with enough privileges) you can wreak havoc on the system. That's the reason why I think that in practice it's not *much* of a difference in security terms. – Matteo Italia Jun 20 '17 at 09:07
9

The main question is: do you want your configuration file to be in some Turing complete language (like Python is)? If you do want that, you might also consider embedding some other (Turing complete) scripting language like Guile or Lua (because there could be perceived as "simpler" to use, or to embed, than Python is; read the chapter on Extending & Embedding Python). I won't discuss that further (because other answers -e.g. by Amon- discussed that in depth) but notice that embedding a scripting language in your application is a major architectural choice, that you should consider very early; I really don't recommend making that choice later!

A well known example of a program configurable thru "scripts" is the GNU emacs editor (or probably AutoCAD in the proprietary realm); so be aware that if you accept scripting, some user would eventually use -and perhaps abuse, in your point of view- that facility extensively and make a multi-thousand lines script; hence the choice of a good enough scripting language is important.

However (at least on POSIX systems), you might consider convenient to enable the configuration "file" to be dynamically computed at initialization time (of course, leaving the burden of a sane configuration to your system admin or user; actually it is a configuration text which comes from some file or from some command). For that, you could simply adopt the convention (and document it) that a configuration file path starting with e.g. a ! or a | is actually a shell command that you would read as a pipeline. This leaves your user with the choice of using whatever "preprocessor" or "scripting language" he is the most familiar with.

(you need to trust your user about security issues if you accept a dynamically computed configuration)

So in your initialization code, your main would (for example) accept some --config argument confarg and get some FILE*configf; from it. If that argument starts with ! (i.e. if (confarg[0]=='!') ....), you would use configf = popen(confarg+1, "r"); and close that pipe with pclose(configf);. Otherwise you would use configf=fopen(confarg, "r"); and close that file with fclose(configf); (don't forget the error checking). See pipe(7), popen(3), fopen(3). For an application coded in Python read about os.popen, etc...

(document also for the weird user wanting to pass a configuration file named !foo.config to pass ./!foo.config to bypass the popen trick above)

BTW, such a trick is only a convenience (to avoid requiring the advanced user to e.g. code some shell script to generate a configuration file). If the user want to report any bug, he should send you the generated configuration file...

Notice that you could also design your application with the ability to use and load plugins at initialization time, e.g. with dlopen(3) (and you need to trust your user about that plugin). Again, this is a very important architectural decision (and you need to define and provide some rather stable API and convention about these plugins and your application).

For an application coded in a scripting language like Python you could also accept some program argument for eval or exec or similar primitives. Again, the security issues are then the concern of the (advanced) user.

Regarding the textual format for your configuration file (be it generated or not), I believe that you mostly need to document it well (and the choice of some particular format is not that important; however I recommend to let your user be able to put some -skipped- comments inside it). You could use JSON (preferably with some JSON parser accepting and skipping comments with usual // till eol or /*...*/ ...), or YAML, or XML, or INI or your own thing. Parsing a configuration file is reasonably easy (and you'll find many libraries related to that task).

Basile Starynkevitch
  • 32,434
  • 6
  • 84
  • 125
  • +1 for mentioning the Turing-completeness of programming languages. [Some interesting works](http://langsec.org/) reveal that limiting the computational power of the input format is key to securing the input handling layer. Using a Turing-complete programming language goes in the opposite direction. – Matheus Moreira Jun 23 '17 at 04:17
3

Adding to amon's answer, have you considered alternatives? JSON is maybe more than you need, but Python files will probably give you problems in the future for the reasons mentioned above.

However Python already has a config parser for a very simple config language that might fulfill all your needs. The ConfigParser module implements a simple config language.

CodeMonkey
  • 214
  • 1
  • 7
  • 1
    Using something 'similar to ... Microsoft Windows INI files' seems to be a bad idea, both on grounds that it's not a particularly flexible format, and because 'similar' implies undocumented incompatibilities. – Pete Kirkham Jun 19 '17 at 13:03
  • 1
    @PeteKirkham Well, it's simple, it's documented and it's part of the Python standard library. It could be the perfect solution for OPs needs, because he's searching for something that is supported directly by Python and is simpler than JSON. As long as he doesn't further specify what his needs are, I think this answer may be helpful to him. – CodeMonkey Jun 19 '17 at 13:08
  • 1
    I was going to suggest essentially this - see what types of configuration files Python libs support and pick one of those. Also, Powershell has the notion of data sections - which allow limited Powershell language constructs - protecting against malicious code. If Python has a lib that supports a limited subset of Python for configuration, that at least mitigates one of the cons against the idea in the OP. – Χpẘ Jun 19 '17 at 22:42
  • 1
    @PeteKirkham It's more likely a problem the other way around. Windows tends to have a bunch of undocumented crap that explodes on you. Python tends to be well documented and straightforward. That said, if you all you need is simple key/value pairs (*maybe* with sections), it's a pretty good choice. I suspect this covers 90% of use cases. If .NET's config files were ini instead of the monstrous XML with a schema that's actually code masquerading as config, we'd all be a lot better off. – jpmc26 Jun 20 '17 at 07:29
  • @jpmc26 yes, the Windows ini format contains vendor specific workarounds for backward compatibility, covered by NDAs, so there are cases where any open implementation is not going to be fully compatible with it. So you have a file which looks almost exactly like a .ini file but will not behave the same way. It would be better to use something which does not result in that confusion. – Pete Kirkham Jun 20 '17 at 09:08
  • 1
    @PeteKirkham Not really. INI being best for simple use cases in the first place, chances are you can avoid any incompatibilities. They also don't matter if you're not consuming the file with two different languages, and even if you are, you can probably find open implementations in any language (allowing you to either not have incompatibilities or, at minimum, know exactly what they are). I agree that you should use another format if your use case is actually complex enough that you start running into them or if you can't find an existing implementation you can trust, but that's not common. – jpmc26 Jun 20 '17 at 15:06
  • @jpmc26 I've wasted enough time fixing issues in systems which someone thought the 'chances are it won't happen' that I avoid such mistakes. You are free to do otherwise. – Pete Kirkham Jun 21 '17 at 11:58
  • 1
    @PeteKirkham Do you write all your code to be compatible with Windows, Linux, and Mac just in case it ever has to be ported? Do you use XML in case the file one day needs to be arbitrarily complex? We're constantly making decisions about trade offs based on what we think is likely and what isn't. Honestly, I think the simplest way to avoid the problems you're talking about is to just avoid the blasted Windows implementation of INI. If something is already using it, you're stuck with INI anyway. – jpmc26 Jun 21 '17 at 16:30
  • @jpmc26 Basically, yes, I use modern formats where the behaviour of *all* implementations is specified, such as JSON or XML, rather than using an implementation different to user's expectations. You can't avoid user's expectations of a INI file being something that can be edited in Notepad in a Latin encoding. My users *will* put emojis in it. IF you don't have to deal with that, then once again I state you are free to use what formats you like, even legacy ones from 16-bit era Windows. – Pete Kirkham Jun 21 '17 at 16:48
  • @PeteKirkham I kind of can't believe the kind of users you describe to have. The ones that can't use the ConfigParser format can in my experience also not fill out JSON and XML, they need a GUI with maximum 1 button. – CodeMonkey Jun 22 '17 at 05:09
  • @CodeMonkey I would say it's far better to not expect users to edit config files at all, but provide UI to set configuration, even if it's at the level of say chrome://flags/ it's better than a config file. At which point the having a config file is no longer relevant - most systems I've built in the last ten years use a database with full audit trail, as if someone changes the settings and it breaks the system the owners want to know about it. And yes, if you create a maintenance system for a Chinese nuclear plant they will insist on using non-ASCII characters in their config. – Pete Kirkham Jun 22 '17 at 08:09
  • 1
    @PeteKirkham exactly my point, either you give them access to the config file directly, which is "dangerous" in any case, whether JSON, XML or ConfigParser. Or you provide them with a more safe interface from inside your application and you could even just use a binary format to save your configuration with less chance of (accidental) tampering. Then you also don't need to store Chinese strings for Chinese apps because the translations are in your application not in the config, 好不好? – CodeMonkey Jun 22 '17 at 08:47
1

I have worked for a long time with some well-known software which has its configuration files written in TCL, so the idea is not new. This worked quite well, since users who didn't know the language could still write/edit simple configuration files using a single set name value statement, while more advanced users and developers could pull sophisticated tricks with this.

I don't think that "the config files can get difficult to debug" is a valid concern. As long as your application doesn't force users to write scripts, your users can always use simple assignments in their configuration files, which is hardly any more difficult to get right compared to JSON or XML.

Rewriting the config is a problem, though it's not as bad as it seems. Updating arbitrary code is impossible, but loading config from a file, altering it and saving it back is. Basically, if you do some scripting in a config file which is not read-only, you'll just end up with an equivalent list of set name value statements once it is saved. A good hint that this will happen is a "do not edit" comment at the beginning of the file.

One thing to consider is that your config files won't be reliably readable by simple regex-based tools, such as sed, but as far as I understand this is already not the case with your current JSON files, so there's not much to lose.

Just make sure you use appropriate sandboxing techniques when executing your config files.

Dmitry Grigoryev
  • 494
  • 2
  • 13
1

Besides all the valid points of other good answers here (wow, they even mentioned the Turing-complete concept), there are actually a couple solid practical reasons to NOT use a Python file as your configuration, even when you are working on a Python-only project.

  1. The settings inside a Python source file is technically part of executable source code, rather than a read-only data file. If you go this route, you would typically do import config, because that kind of "convenience" was presumably one of the major reason that people started with using a Python file as config in the first place. Now you tend to commit that config.py into your repo, otherwise your end user would encounter a confusing ImportError when they try to run your program for the first time.

  2. Assuming you actually committing that config.py into your repo, now your team members would probably have different settings on different environment. Imagine someday somehow some member accidentally commits his/her local configuration file into the repo.

  3. Last but not the least, your project could have passwords in configuration file. (This is a debatable practice in its own, but it happens anyway.) And if your configuration file exists in repo, you risk committing your credential into a public repo.

Now, using a data-only configuration file, such as the universal JSON format, can avoid all the 3 problems above, because you can reasonably ask the user to come up with their own config.json and feed it into your program.

PS: It is true that JSON has many restriction. 2 of the limitations mentioned by the OP, can be solved by some creativity.

  • How to put comments in a JSON file (properly)
  • And I usually have a placeholder to bypass the trailing comma rule. Like this:

    {
        "foo": 123,
        "bar": 456,
        "_placeholder_": "all other lines in this file can now contain trailing comma"
    }
    
RayLuo
  • 651
  • 6
  • 8