264

I just ran across this old question asking what's so evil about global state, and the top-voted, accepted answer asserts that you can't trust any code that works with global variables, because some other code somewhere else might come along and modify its value and then you don't know what the behavior of your code will be because the data is different! But when I look at that, I can't help but think that that's a really weak explanation, because how is that any different from working with data stored in a database?

When your program is working with data from a database, you don't care if other code in your system is changing it, or even if an entirely different program is changing it, for that matter. You don't care what the data is; that's the entire point. All that matters is that your code deals correctly with the data that it encounters. (Obviously I'm glossing over the often-thorny issue of caching here, but let's ignore that for the moment.)

But if the data you're working with is coming from an external source that your code has no control over, such as a database (or user input, or a network socket, or a file, etc...) and there's nothing wrong with that, then how is global data within the code itself--which your program has a much greater degree of control over--somehow a bad thing when it's obviously far less bad than perfectly normal stuff that no one sees as a problem?

Mason Wheeler
  • 82,151
  • 24
  • 234
  • 309
  • 122
    It's nice to see veteran members challenge the dogmas a little ... – svidgen May 24 '16 at 19:58
  • 11
    In an application, you usually provide a mean to access the database, this mean is passed to functions which want to access the database. You don't do that with global variables, you simply know they're at hand. That's a key difference right there. – Andy May 24 '16 at 20:55
  • 46
    Global state is like having a single database with a single table with a single row with infinitely many columns accessed concurrently by an arbitrary number of applications. – BevynQ May 24 '16 at 23:35
  • 2
    @BevynQ that makes no sense at all to me, could you elaborate? – sara May 25 '16 at 06:34
  • 1
    The state of the database is part of the spec of most operations, for example when I add a new customer; the testers will check the customer record is in the database. These tests will hopefully be automated. Global variables are just there because they make life easier for the programmer. – Ian May 25 '16 at 08:36
  • 43
    Databases are also evil. – Stig Hemmer May 25 '16 at 09:21
  • 1
    Much of the pain you get from a database is exactly the same as a singleton. For example difficulty in automated testing. Singletons and globals aren't evil. But like so many concepts you need to know the pros/cons of them. Typically the singleton is the right model for the database. – ArTs May 25 '16 at 09:58
  • 4
    The trick is to move all the singletoness into a single place where it can be managed and walled off. Arguably that is the entire raison d'être for the database. – ArTs May 25 '16 at 10:02
  • 3
    Also, it is possible to make [databases immutable](http://www.datomic.com/) as well. – gardenhead May 25 '16 at 18:36
  • 28
    It's entertaining to "invert" the argument you make here and go in the other direction. A struct that has a pointer to another struct is *logically* just a foreign key in one row of one table that keys to another row of another table. How is working with *any* code, including walking *linked lists* any different from manipulating data in a database? Answer: it isn't. Question: why then do we manipulate in-memory data structures and in-database data structures using such different tools? Answer: I really don't know! Seems like an accident of history rather than good design. – Eric Lippert May 25 '16 at 22:18
  • I take umbrage with this `When your program is working with data from a database, you don't care if other code in your system is changing it, or even if an entirely different program is changing it, for that matter.` I care a great deal. Application A should never be able to see Application B's data except via application B. – BevynQ May 25 '16 at 22:31
  • 1
    @Kai It is possible to design a database badly so that all data is globally scoped. It is also possible to highly restrict who has access to what data when and how. It is also possible to enforce data integrity rules. – BevynQ May 25 '16 at 22:40
  • 3
    @EricLippert *please* make that a question... – trichoplax is on Codidact now May 26 '16 at 00:06
  • 3
    The [MUMPS programming language](https://en.wikipedia.org/wiki/MUMPS) is worth a mention here. In MUMPS, there really is no functional difference between global variables and databases! – Andrew Coonce May 26 '16 at 00:42
  • 1
    @ArTs: databases is not a Singleton, they are usually more akin to a Borg. You can create multiple instances of the connections, or a connection pool, but they share the same state. – Lie Ryan May 26 '16 at 01:41
  • @LieRyan A database CONNECTION is not a database. I am however trying to describe the real world object rather than the data structures. Also, I called it "a database", rather than "the database". Applications, sometimes have multiple databases, but each one there must be one and only one of. – ArTs May 26 '16 at 01:47
  • It is the **quality of the design and the code** that touches global states that matters. – rwong May 26 '16 at 21:07
  • 1
    I'm voting to close this question as off-topic because the premise is fundamentally flawed as it is an equivalence fallacy. –  May 26 '16 at 23:31
  • 1
    I don't understand this question. Is your database connection stored as a global variable? If not, then how is it global? It's only accessible to the procedures that you explicitly passed it to... – user541686 May 27 '16 at 04:36
  • @EricLippert Actually that difference is a practical consideration having to do with the requirements of using data in a database because 1) it has to be persisted outside of the current program instance and 2) it's dynamic state has to be shared (eventually, in some way) with other instances and programs with potentially far-flung distribution. Changing a shared datum is hard/kludgy enough when you only have to synchronize with another thread in the same program instance. When you have to synchronize thousands of changes with millions of people across the world, you need a different approach. – RBarryYoung May 27 '16 at 19:00
  • 4
    @RBarryYoung: Certainly there are many, many implementation considerations. My musing was more along the lines of why languages which fetch data by dereferencing a pointer, and languages which fetch data by querying a table feel *so* different, when then underlying operation is conceptually the same. It's always struck me as odd. – Eric Lippert May 27 '16 at 19:03
  • @EricLippert ... IMHO, it's really the same answer as "*Why is Web development so much different (worse) than Windows development? Why cant I just develop Web apps the way I develop windows apps?*" AFAIK, the answer is: "Practical Considerations". – RBarryYoung May 27 '16 at 19:04
  • @EricLippert It has always struck me as odd too, and I've spent a lot of time pondering it. The best answer I've been able to come up with is the practical considerations of sharing, updating, protecting/persisting, and synchronizing changes transactionaly. You could take the ECC design pattern and extend it to make all data seem like just items and properties in a huge Object Model, but you get hung up on things again and again, like how to leverage the DB optimizers to search for row sets, and how to explicitly control when data is fetched, updated, comitted, checked for being stale etc. – RBarryYoung May 27 '16 at 19:12
  • 3
    @JarrodRoberson How does that make it *off-topic*? That just means the answers should be "Your premise that ... is fundamentally flawed because ... " – Ixrec May 28 '16 at 10:15
  • If you're database is source of truth of your data then you're right. However, if you use event sourcing, the source of truth is events, not your global database. – blockhead May 30 '16 at 13:05
  • I'm surprised no-one has talked much about testability yet. Global variables are bad because they represent a testing combinatorics problem. Technically speaking each global variable introduced (minimally) doubles the number of tests you must run for unit testing. A database is different because it isolates these "super-global variables" in a metaphor that allows you to reset them all to a given state (drop table;insert...insert...insert...), and relational databases even allow you to constrain these "variables" in ways that are not possible in code (referential integrity for example). – Calphool May 30 '16 at 17:12
  • 1
    @StigHemmer Everything is evil. Except - in their mind - Google, – ott-- May 30 '16 at 18:04
  • I don't think they're that much comparable. There isn't a widely used rigorous set of properties specifically designed to minimize the negative effect of global variables in the same way as ACID principles in database. They are much more prone to errors and unintended effects than DB operations. – xji May 30 '16 at 18:34
  • 1
    @EricLippert The situation feels even worse on the client side of a web app, wherein you have to work in a totally different mode of thought when you're hitting a local object (usually synchronously) versus something over the wire (usually asynchronously). *Why do I have to care where the object is coming from, darnit!!?? **I don't wanna!*** – svidgen Jun 06 '16 at 21:08
  • 1
    One point is to just consider: What if the code was to run parallel in multiple remote machines AND has to maintain a global shared state ? A database is the answer. – S.D. Jan 06 '17 at 12:23
  • 1
    There has been a lot of discussion about globals being bad because they are mutable--which says nothing about my most common use of a global: Holding the information read from a configuration file. You either make it a global or you end up passing it around amongst all higher level routines and I consider the latter a bigger problem than the former. I would never use a global for something that is mutable and not a singleton, though. – Loren Pechtel Feb 22 '17 at 02:49

22 Answers22

119

First, I'd say that the answer you link to overstates that particular issue and that the primary evil of global state is that it introduces coupling in unpredictable ways that can make it difficult to change the behaviour of your system in future.

But delving into this issue further, there are differences between global state in a typical object-oriented application and the state that is held in a database. Briefly, the most important of these are:

  • Object-oriented systems allow replacing an object with a different class of object, as long as it is a subtype of the original type. This allows behaviour to be changed, not just data.

  • Global state in an application does not typically provide the strong consistency guarantees that a database does -- there are no transactions during which you see a consistent state for it, no atomic updates, etc.

Additionally, we can see database state as a necessary evil; it is impossible to eliminate it from our systems. Global state, however, is unnecessary. We can entirely eliminate it. So even were the issues with a database just as bad, we can still eliminate some of the potential problems and a partial solution is better than no solution.

Jules
  • 17,614
  • 2
  • 33
  • 63
  • 44
    I think the point of the consistency is actually the main reason: When global variables are used in code, there is usually no telling when they are actually initialized. The dependencies between the modules are deeply hidden inside the sequence of calls, and simple stuff like swapping two calls can produce really nasty bugs because suddenly some global variable is not correctly initialized anymore when it's first used. At least that is the problem I have with the legacy code that I need to work with, and which makes refactoring a nightmare. – cmaster - reinstate monica May 24 '16 at 20:13
  • As @Jules notes, with objects you can substitute with something equivalent, but with globals not. Using *a database also provides for substitution*, just point the configuration at a different database. For example, a database can be mocked, whereas globals not so much. Because multiple databases can simultaneously exist they have similarities with objects that they don't have with globals. – Erik Eidt May 24 '16 at 20:30
  • 1
    Re "global state is unnecessary. We can entirely eliminate it." Tell that to a video game developer, or the developer of a high fidelity simulation of the solar system or of a galaxy. There has to be global state because everything can interact with everything else. In "A new model for efficient dynamic simulation" by Paul Dworkin & David Zeltzer, the authors went so far as to propose the concept of a *god-object*. – David Hammen May 25 '16 at 07:51
  • @DavidHammen there is a difference between global variables and singleton objects. – OrangeDog May 25 '16 at 08:46
  • 24
    @DavidHammen I've actually worked on world-state simulation for an online game, which is clearly in the category of application you're talking about, and even there I would not (and did not) use global state for it. Even if some efficiency gains can be made by using global state, the issue is that *global state is not scalable*. It becomes *difficult* to use once you move from a single-threaded to multi-threaded architecture. It becomes *inefficient* when you move to a NUMA architecture. It becomes *impossible* when you move to a distributed architecture. The paper you cite dates from... – Jules May 25 '16 at 10:47
  • 24
    1993. These problems were less of an issue then. The authors were working on a single processor system, simulating interactions of 1,000 objects. In a modern system you'd likely run a simulation of that kind on at the very least a dual-core system, but quite likely it could be at least 6 cores in a single system. For larger problems still, you'd run it on a cluster. For this kind of change, you *must avoid global state* because global state cannot be effectively shared. – Jules May 25 '16 at 10:56
  • 20
    I think calling database state a "necessary evil" is a bit of a stretch. I mean, since when did state become evil? State is the entire purpose of a database. State is information. Without state, all you have are operators. What good are operators without something to operate on? That state has to go somewhere. At the end of the day, functional programming is just a means to an end and without state to mutate there would be no point in doing anything at all. It's a bit like a baker calling the cake a necessary evil - it's not evil. It's the entire point of the thing. – J... May 25 '16 at 12:25
  • @Jules - True, but there's still some object that knows at least a little bit about every object in the game, solar system, galaxy, or whatever it is that you are simulating. – David Hammen May 25 '16 at 13:40
  • There's not that much difference between a system where you have carefully managed global variables and a god object that knows about everything. – David Hammen May 25 '16 at 13:51
  • 5
    @DavidHammen "there's still some object that knows at least a little bit about every object in the game" Not necessarily true. A major technique in modern distributed simulation is taking advantage of locality and making approximations such that distant objects do _not_ need to know about everything far away, only what data is supplied to them by the owners of those distant objects. – JAB May 25 '16 at 14:16
  • is there any relevance to the idea that the database represents something that should be persistent verse something that is not persistent – ford prefect May 27 '16 at 15:49
77

First, what are the problems with global variables, based on the accepted answer to the question you linked?

Very briefly, it makes program state unpredictable.

Databases are, the vast majority of the time, ACID compliant. ACID specifically addresses the underlying issues that would make a data store unpredictable or unreliable.

Further, global state hurts the readability of your code.

This is because global variables exist in a scope far away from their usage, maybe even in a different file. When using a database, you are using a record set or ORM object that is local to the code you are reading (or should be).

Database drivers typically provide a consistent, understandable interface to access data that is the same regardless of problem domain. When you get data from a database, your program has a copy of the data. Updates are atomic. Contrast to global variables, where multiple threads or methods may be operating on the same piece of data with no atomicity unless you add synchronization yourself. Updates to the data are unpredictable and difficult to track down. Updates may be interleaved, causing bog-standard textbook examples of multithreaded data corruption (e.g. interleaved increments).

Databases typically model different data than global variables to begin with, but leaving that aside for a moment, databases are designed from the ground-up to be an ACID-compliant data store that mitigates many of the concerns with global variables.

  • 4
    +1 What you're saying is that databases have *transactions,* making it possible to read and write *multiple pieces* of global state atomically. Good point, which can only be circumvented by using global variables for *each* completely independent piece of information. – l0b0 May 25 '16 at 07:40
  • 1
    @l0b0 transactions are the mechanism that achieves most of the ACID goals, correct. But the DB interface itself makes the code clearer by bringing the data into a more local scope. Think of using a JDBC RecordSet with a try-with-resources block, or an ORM function that gets a piece of data using a single function call. Compare this with managing data far away from the code you are reading in a global somewhere. –  May 25 '16 at 10:03
  • 1
    So it'd be okay to use global variables if I copy the value to a local variable (with a mutex) at the beginning of the function, modify the local variable, and then copy the value back to the global variable at the end of the function? (... he asked rhetorically.) – R.M. May 26 '16 at 18:03
  • 1
    @R.M. He mentioned two points. What you threw out might address the first (program state unpredictable), but it doesn't address the second (the readability of your code). In fact, it may make the readability of your program even worse :P. – riwalk May 26 '16 at 19:07
  • 1
    @R.M. Your function would run consistently, yes. But you'd then have the question of whether something else had modified the global variable in the meantime, and that modification was more important than what you're writing to it. Databases may have the same problem too, of course. – Graham May 27 '16 at 13:11
  • I don't think it has anything to do with transactions. Global variables are considered poor practice even in single threaded programs. – thedayturns Jun 21 '16 at 20:17
45

I'd offer a few observations:

Yes, a database is global state.

In fact, it's a super-global state, as you pointed out. It's universal! Its scope entails anything or anyone that connects to the database. And, I suspect lots of folks with years of experience can tell you horror stories about how "strange things" in the data led to "unexpected behavior" in one or more of the relevant applications...

One of the potential consequences of using a global variable is that two distinct "modules" will use that variable for their own distinct purposes. And to that extent, a database table is no different. It can fall victim to the same problem.

Hmm ... Here's the thing:

If a module doesn't operate extrinsically in some way, it does nothing.

A useful module can be given data or it can find it. And, it can return data or it can modify state. But, if it doesn't interact with the external world in some way, it may as well do nothing.

Now, our preference is to receive data and return data. Most modules are simply easier to write if they can be written with utter disregard for what the outside world is doing. But ultimately, something needs to find the data and modify that external, global state.

Furthermore, in real-world applications, the data exists so that it can be read and updated by various operations. Some issues are prevented by locks and transactions. But, preventing these operations from conflicting with each other in principle, at the end of the day, simply involves careful thinking. (And making mistakes...)

But also, we're generally not working directly with the global state.

Unless the application lives in the data layer (in SQL or whatever), the objects our modules work with are actually a copies of the shared global state. We can do whatever we want those without any impact to the actual, shared state.

And, in cases where we need to mutate that global state, under the assumption that the data we were given hasn't changed, we can generally perform the same-ish sort of locking that we would on our local globals.

And finally, we usually do different things with databases than we might with naughty globals.

A naughty, broken global looks like this:

Int32 counter = 0;

public someMethod() {
  for (counter = 0; counter < whatever; counter++) {
    // do other stuff.
  }
}

public otherMethod() {
  for (counter = 100; counter < whatever; counter--) {
    // do other stuff.
  }
}

We simply don't use databases for in-process/operational stuff like that. And it might be the slow nature of the database and the relative convenience of a simple variable that deters us: Our sluggish, awkward interaction with databases simply make them bad candidates for many of the mistakes we've historically made with variables.

svidgen
  • 13,414
  • 2
  • 34
  • 60
  • 3
    The way to *guarantee* (since we can't assume) "that the data we were given hasn't changed" in a database would be a transaction. – l0b0 May 25 '16 at 07:46
  • Yes... that was supposed to be implied with "same -ish sort of locking." – svidgen May 25 '16 at 11:54
  • But, it can be hard to think carefully at the end of the day. –  May 26 '16 at 00:51
  • Yes, databases are indeed global state -- which is why it is so tempting to share data using something like git or ipfs. – William Payne May 26 '16 at 12:19
22

I disagree with the fundamental claim that:

When your program is working with data from a database, you don't care if other code in your system is changing it, or even if an entirely different program is changing it, for that matter.

My initial thought was "Wow. Just Wow". So much time and effort is spent trying to avoid exactly this - and working out what trade-offs and compromises work for each application. To just ignore it is a recipe for disaster.

But I also diasgree on an architectural level. A global variable is not just global state. It's global state that is accessible from anywhere transparently. In contrast to use a database you need to have a handle to it - (unless you store than handle in a global variable....)

For example using a global variable might look like this

int looks_ok_but_isnt() {
  return global_int++;
}

int somewhere_else() {
  ...
  int v = looks_ok_but_isnt();
  ...
}

But doing the same thing with a database would have to be more explicit about what its doing

int looks_like_its_using_a_database( MyDB * db ) {
   return db->get_and_increment("v");
}

int somewhere_else( MyBD * db ) { 
   ...
   v = looks_like_its_using_a_database(db);
   ...
}

The database one is obviously mucking with a database. If you wanted to not use a database you can use explicit state and it looks almost the same as the database case.

int looks_like_it_uses_explicit_state( MyState * state ) {
   return state->v++;
}


int somewhere_else( MyState * state ) { 
   ...
   v = looks_like_it_uses_explicit_state(state);
   ...
}

So I would argue using a database is much more like using explicit state, than using global variables.

Michael Anderson
  • 1,395
  • 11
  • 14
  • 2
    Yeah, I thought it was interesting when the OP said: "*You don't care what the data is; that's the entire point*" - if we don't care, then why store it? Here is a thought: let's just stop using variables and data *at all*. That should make things much simpler. "Stop the world, I want to get off!" –  May 26 '16 at 00:56
  • 1
    +1 Different threads or apps writing and reading from the same database is a potential source of a large number of well-known problems, which is why there should always be a strategy for dealing with this, either at the database or app level, or both. So it's definitely NOT true that you (the app developer) don't care about who else is reading or writing from the database. – Andres F. May 27 '16 at 17:39
  • 1
    +1 On a side note, this answer pretty much explains what I hate most about dependency injection. It hides these kinds of dependencies. – jpmc26 May 28 '16 at 01:04
  • @jpmc26 I might be marking words, but isn't the above a good example of how dependency injection (as opposed to global lookup) helps make dependencies explicit? It seems to me like you rather take issue with certain APIs, like perhaps the annotation magic used by JAX-RS and Spring. – Emil Lundberg May 31 '16 at 09:39
  • 3
    @EmilLundberg No, the problem is when you have a hierarchy. Dependency injection hides the dependencies of lower tiers from the code in the higher tiers, making it difficult to keep track of which things interact. For example, if `MakeNewThing` depends on `MakeNewThingInDb` and my controller class uses `MakeNewThing`, then it's not clear from the code in my controller that I'm modifying the database. So then what if I use another class that actually *commits* my current transaction to the DB? DI makes it very difficult to control the scope of an object. – jpmc26 May 31 '16 at 09:46
  • @jpmc26 I don't think that's the fault of DI. Code that modifies the data in database should never call commit itself. Transaction management should always be structured such that commit is called in the same function/scope as the transaction start. If you have commits sprawling all over the code, that's like the same evil as unstructured programming. – Lie Ryan Feb 22 '17 at 01:59
  • @LieRyan "Never" is a strong word. That said, yes, commits should done at the highest appropriate scope possible, although that isn't always a single point in a sequence. Also consider it the other way around: how do I know what database operations are done as a group on a single transaction? The problem is that the structure of the code no longer tells me about the relationships between the different scopes; I can't follow the code and easily tell which things use the same transaction. DI takes this kind of scope control out of your hands, breaking it up into tiny disconnected pieces. – jpmc26 Feb 22 '17 at 02:07
  • @jpmc26: `with db.begin() as session: modify_data(session)`, rollback happens by raising exception, commit happens by returning from modify_data(), savepoints happens by making subtransaction contexts. This is one example of well-structured transaction handling: start transaction, commit, and rollbacks all happen in the same function. What operations happens as a group is trivially answered by simply stepping through modify_data(). Refer to sqlalchemy for an example database library that makes this kind of structured transaction handling straightforward. – Lie Ryan Feb 22 '17 at 12:59
  • @LieRyan ? Your example doesn't have any DI. – jpmc26 Feb 22 '17 at 16:52
  • @jpmc26: DI is difficult to demonstrate in one liner, and I don't have space for lengthy comments here. But DI or no DI in the sample code makes no difference, since the code in modify_data never calls commit/rollback/savepoint itself, DI and transaction becomes orthogonal issue. With DI, you can even switch the objects you pass around with a mocked or an in memory objects, and your data modifying code need not be aware/care whether or not it's talking to a database. – Lie Ryan Feb 23 '17 at 02:04
18

The point that the sole reason global variables can't be trusted since the state can be changed somewhere else is, in itself, not reason enough to not use them, agreed (it's a pretty good reason though!). It's likely the answer was mainly describing usage where restricting a variable's access to only areas of code that its concerned with would make more sense.

Databases are a different matter, however, because they're designed for the purpose of being accessed "globally" so to speak.

For example:

  • Databases typically have built in type and structure validation that goes further than the language accessing them
  • Databases almost unanimously update based off transactions, which prevents inconsistent states, where there's no guarantees what the end state will look like in a global object (unless it's hidden behind a singleton)
  • Database structure is at least implicitly documented based off table or object structure, more-so than the application utilizing it

Most importantly though, databases serve a different purpose than a global variable. Databases are for storing and searching large quantities of organized data, where global variables serve specific niches (when justifiable).

Jeffrey Sweeney
  • 898
  • 6
  • 12
  • 1
    Huh. You beat me to it while I was half-way through writing an almost identical answer. :) – Jules May 24 '16 at 20:06
  • @Jules your answer provides details more from the application side of things; keep it. – Jeffrey Sweeney May 24 '16 at 20:10
  • But, unless you depend entirely on stored procedures for data access, all that structure will still fail to enforce that the tables are used as intended. Or that operations are performed in the appropriate order. Or that locks (transactions) are created as needed. – svidgen May 24 '16 at 20:10
  • Hi, are points 1 and 3 still applicable if you are using a static-typed language like Java? – Jesvin Jose Jun 01 '16 at 10:09
  • @aitchnyu Not necessarily. The point being made is that databases are built for the purpose of reliably sharing data, where global variables typically are not. An object implementing a self-documenting interface in a strict language serves a different purpose than even a loose typed NoSQL database. – Jeffrey Sweeney Jun 01 '16 at 13:48
11

But when I look at that, I can't help but think that that's a really weak explanation, because how is that any different from working with data stored in a database?

Or any different from a working with an interactive device, with a file, with shared memory, etc. A program that does exactly the same thing every time it runs is a very boring and rather useless program. So yes, it's a weak argument.

To me, the difference that make a difference with regard to global variables is that they form hidden and unprotected lines of communication. Reading from a keyboard is very obvious and protected. I have to make a certain function call, and I cannot access the keyboard driver. The same applies to file access, shared memory, and your example, databases. It's obvious to the reader of the code that this function reads from the keyboard, that function accesses a file, some other function accesses shared memory (and there had better be protections around that), and yet some other function accesses a database.

With global variables, on the other hand, its not obvious at all. The API says to call foo(this_argument, that_argument). There's nothing in the calling sequence that says the global variable g_DangerWillRobinson should be set to some value but before calling foo (or examined after calling foo).


Google banned the use of non-const reference arguments in C++ primarily because it is not obvious to the reader of the code that foo(x) will change x because that foo takes a non-constant reference as an argument. (Compare with C#, which dictates that both the function definition and the call site must qualify a reference parameter with the ref keyword.) While I do not agree with the Google standard on this, I do understand their point.

Code is written once and modified a few times, but if it's at all good, it is read many, many times. Hidden lines of communications are very bad karma. C++'s non-const reference represent a minor hidden line of communication. A good API or a good IDE will show me that "Oh! This is call by reference." Global variables are a huge hidden line of communication.

David Hammen
  • 8,194
  • 28
  • 37
8

I think that the quoted explanation oversimplifies the issue to the point where the reasoning becomes ridiculous. Of course, the state of an external database contributes to the global state. The important question is how your program depends on the (mutable) global state. If a library function to split strings on white-space would depend on intermediary results stored in a database, I would object to this design at least as much as I would object to a global character array used for the same purpose. On the other hand, if you decide that your application doesn't need a full-blown DBMS to store business data at this point and a global in-memory key-value structure will do, this is not necessarily a sign of poor design. What is important is that – no matter what solution you pick to store your data – this choice is isolated to a very small portion of the system so most components can be agnostic to the solution chosen for deployment and unit-tested in isolation and the deployed solution can be changed at a later time with little effort.

5gon12eder
  • 6,956
  • 2
  • 23
  • 29
8

As a software engineer working predominantly with embedded firmware, I'm almost always using global variables for anything going between modules. In fact, it's best practise for embedded. They are assigned statically, so there's no risk of blowing the heap/stack and there's no extra time taken for stack allocation/clean-up on function entry/exit.

The downside of this is that we do have to consider how those variables are used, and a lot of that comes down to the same kind of thought that goes into database-wrangling. Any asynchronous read/writes of variables MUST be atomic. If more than one place can write a variable, some thought must go into making sure they always write valid data, so the previous write is not arbitrarily replaced (or that arbitrary replacement is a safe thing to do). If the same variable is read more than once, some thought must go into considering what happens if the variable changes value between reads, or a copy of the variable must be taken at the start so that processing is done using a consistent value, even if that value becomes stale during processing.

(For that last one, on my very first day of a contract working on an aircraft countermeasures system, so highly safety-related, the software team were looking at a bug report they'd been trying to figure out for a week or so. I'd had just enough time to download the dev tools and a copy of the code. I asked "couldn't that variable be updated between reads and cause it?" but didn't really get an answer. Hey, what does the new guy know, after all? So whilst they were still discussing it, I added protective code to read the variable atomically, did a local build, and basically said "hey guys, try this". Way to prove I was worth my contracting rate. :)

So global variables are not an unambiguously bad thing, but they do leave you open to a wide range of issues if you don't think about them carefully.

David Hammen
  • 8,194
  • 28
  • 37
Graham
  • 1,996
  • 1
  • 12
  • 11
7

Depending on what aspect you're judging, global variables and database access may be worlds apart, but as long as we're judging them as dependencies, they are the same.

Let's consider functional programming's definition of a pure function states that it must depends solely on the parameters it takes as inputs, producing a deterministic output. That is, given the same set of arguments twice, it must produce the same result.

When a function depends on a global variable, it can no longer be considered pure, since, for the same set or arguments, it may yield different outputs because the value of the global variable may have changed between the calls.

However, the function can still be seen as deterministic if we consider the global variable as much a part of the function's interface as its other arguments, so it isn't the problem. The problem is only that this is hidden until the moment we are surprised by unexpected behavior from seemingly obvious functions, then go read their implementations to discover the hidden dependencies.

This part, the moment where a global variable becomes a hidden dependency is what is considered evil by us programmers. It makes the code harder to reason about, hard to predict how it will behave, hard to reuse, hard to test and especially, it increases debug and fix time when a problem occurs.

The same thing happens when we hide the dependency on the database. We can have functions or objects making direct calls to database queries and commands, hiding these dependencies and causing us the exact same trouble that global variables cause; or we can make them explicit, which, as it turns out, is considered a best-practice that goes by many names, such as repository pattern, data-store, gateway, etc.

P.S.: There are other aspects which are important to this comparison, such as whether concurrency is involved, but that point is covered by other answers here.

MichelHenrich
  • 6,225
  • 1
  • 27
  • 29
6

Okay, let's start from the historical point.

We're in an old application, written in your typical mix of assembly and C. There's no functions, just procedures. When you want to pass an argument or return value from a procedure, you use a global variable. Needless to say, this is quite hard to keep track of, and in general, every procedure can do whatever it wants with every global variable. Unsurprisingly, people turned to passing arguments and return values in a different way as soon as it was feasible (unless it was performance critical not to do so - e.g. look at the Build Engine (Duke 3D) source code). The hate of global variables was born here - you had very little idea what piece of global state each procedure would read and change, and you couldn't really nest procedure calls safely.

Does this mean that global variable hate is a thing of the past? Not quite.

First, I have to mention that I've seen the exact same approach to passing arguments in the project I'm working on right now. For passing two reference type instances in C#, in a project that's about 10 years old. There's literally no good reason to do it like this, and was most likely born out of either cargo-culting, or a complete misunderstanding of how C# works.

The bigger point is that by adding global variables, you're expanding the scope of every single piece of code that has access to that global variable. Remember all those recommendations like "keep your methods short"? If you have 600 global variables (again, real-world example :/), all your method scopes are implicitly expanded by those 600 global variables, and there's no simple way to keep track of who has access to what.

If done wrong (the usual way :)), global variables may have coupling between each other. But you have no idea how they are coupled, and there's no mechanism to ensure that the global state is always consistent. Even if you introduce critical sections to try and keep things consistent, you'll find that it compares poorly to a proper ACID database:

  • There's no way to rollback a partial update, unless you preserve the old values before the "transaction". Needless to say, by this point, passing a value as an argument is already a win :)
  • Everyone accessing the same state must adhere to the same synchronization process. But there's no way to enforce this - if you forget to setup the critical section, you're screwed.
  • Even if you correctly synchronize all access, there might be nested calls that access partially modified state. This means that you either deadlock (if your critical sections aren't reëntrant), or deal with inconsistent data (if they are reëntrant).

Is it possible to resolve these issues? Not really. You need encapsulation to handle this, or really strict discipline. It's hard to do things right, and that's generally not a very good recipe for success in software development :)

Smaller scope tends to make code easier to reason about. Global variables make even the simplest pieces of code include huge swathes of scope.

Of course, this doesn't mean that global scoping is evil. It just shouldn't be the first solution you go for - it's a typical example of "simple to implement, hard to maintain".

Luaan
  • 1,850
  • 1
  • 13
  • 10
  • Sounds a lot like the physical world: very hard to roll things back. –  May 26 '16 at 01:11
  • This is a good answer, but it could stand a thesis statement (TL;DR section) at the outset. – jpmc26 May 31 '16 at 09:58
6

A global variable is a tool, it can be used for good and for evil.

A database is a tool, it can be used for good and for evil.

As the original poster notes, the difference isn't all that big.

Inexperienced students often think that bugs is something that happen to other people. Teachers use "Global variables are evil" as a simplified reason to penalize bad design. Students generally doesn't understand that just because their 100-line program is bug free doesn't mean that the same methods can be used for 10000-line programs.

When you work with databases, you cannot just ban global state since that's what the program is all about. Instead you get more details guidelines like ACID and Normal Forms and so on.

If people used the ACID approach to global variables, they wouldn't be so bad.

On the other hand, if you design databases badly, they can be nightmares.

Stig Hemmer
  • 552
  • 3
  • 5
  • 3
    Typical student claim on stackoverflow: Help me! My code is perfect, but it isn't working right! – David Hammen May 25 '16 at 13:54
  • "ACID approach to global variables" -- see refs in Clojure. – Charles Duffy May 25 '16 at 19:49
  • @DavidHammen and you think professionals have a brain unlike students? – Billal Begueradj May 28 '16 at 12:40
  • @BillalBEGUERADJ - That's the difference between professionals and students. We know that despite years of experience and despite the best efforts of code reviews, testing, etc., our code is not perfect. – David Hammen May 28 '16 at 13:02
  • 1
    Some examples: [jquery code not working the code is perfect don't know what is wrong](http://stackoverflow.com/questions/15106490/jquery-code-not-working-the-code-is-perfect-dont-know-what-is-wrong) and [my code is perfect but idont now why i found problems](http://stackoverflow.com/questions/36965029/my-code-is-perfect-but-idont-now-why-i-found-problems) and [The GUI is perfect, but the calculator code does not work](http://stackoverflow.com/questions/24597799/the-gui-is-perfect-but-the-calculator-code-does-not-work). The list goes on and on. – David Hammen May 28 '16 at 13:21
  • One exception to "my code is perfect and it's not working right" is where one has painstakingly created a minimum working example that specifically shows that the problem lies with the compiler or library. For example, see [OS X libc++ std::uniform_real_distribution bug](http://stackoverflow.com/questions/37382139/os-x-libc-stduniform-real-distribution-bug) on stackoverflow. Even then, the attribution of the bug might be incorrect (see my answer to that question). – David Hammen May 28 '16 at 13:40
5

To me, the primary evil is Globals have no protection against concurrency issues. You can add mechanisms to handle such issues with Globals, but you'll find that the more concurrency issues you solve, the more your Globals start to mimick a database. The secondary evil is no contract on usage.

  • 3
    For example, `errno` in C. – David Hammen May 25 '16 at 13:56
  • 1
    This explains exactly why globals and databases aren't the same. There may be other differences but your specific post destroys the concept entirely. If you gave a quick code example then I'm sure you'd get a lot of upvotes. e.g. MyFunc(){x=globalVar * 5; // ....Some other processing; y=globalVar*34;//Ooops, some other thread could have changed globalVar during Some other processing and x and y are using different values for globalVar in their calculations, which would almost certainly not give desirable results. – Dunk May 26 '16 at 17:28
5

Some of the other answers try to explain why using a database is good. They are wrong! A database is global state and as such is just as evil as a singleton or a global variable. It is all kinds of wrong to use a database when you can easily just use a local Map or an Array instead!

Global variables allow global access, which carries risk of abuse. Global variables also have upsides. Global variables are generally said to be something you should avoid, not something you should never ever use. If you can easily avoid them you should avoid them. But if the benefits outweigh the drawbacks, of course you should use them!*

The exact same thing** applies to databases, which are global state - just like global variables are. If you can make do without accessing a database, and the resulting logic does all you need and is equally complex, using a database adds increased risk to your project, without any corresponding benefit.

In real life, many applications require global state by design, sometimes even persistent global state - that's why we have files, databases, etc.


*The exception here are students. It makes sense to prohibit students from using global variables so they have to learn what the alternatives are.

** Some answers incorrectly claim that databases are somehow better protected than other forms of global state (the question is explicitly about global state, not just global variables). That's bollocks. The primary protection offered in the database scenario is by convention, which is exactly the same for any other global state. Most languages also allow a lot of additional protection for global state, in form of const, classes that simply don't allow changing their state after it's been set in the constructor, or getters and setters that can take thread information or program state into account.

Peter
  • 3,718
  • 1
  • 12
  • 20
2

In a sense, the distinction between global variables and a database is similar to the distinction between private and public members of an object (assuming anybody still uses public fields). If you think of the entire program as an object, then the globals are the private variables, and the database is the public fields.

They key distinction here is one of assumed responsibility.

When you write an object, it is assumed that anyone who maintains the member methods will ensure private fields remain well behaved. But you already give up any assumptions about the state of public fields and treat them with extra care.

The same assumption applies at a wider level to globals v/s database. Also, the programming language/ecosystem guarantees access restrictions on private v/s public in the same was as it enforces them on (nonshared memory) globals v/s database.

When multithreading comes into play, the concept of private v/s public v/s global v/s database is merely distinctions along a spectrum.

static int global; // within process memory space
static int dbvar; // mirrors/caches data outside process memory space

class Cls {
    public: static int class_public; // essentially the same as global
    private: static int class_private; // but public to all methods in class

    private: static void method() {
        static int method_private; // but public to all scopes in method
        // ...
        {
            static int scope1_private; // mutex guarded
            int the_only_truly_private_data;
        }
        // ...
        {
            static int scope2_private; // mutex guarded
        }
    }
}
Benito Ciaro
  • 129
  • 3
1

A database can be a global state, but it doesn't have to be all the time. I disagree with the assumption that you don't have control. One way to manage that is locking and security. This can be done at the record, table or entire database. Another approach is to have some sort of version field that would prevent the changing of a record if the data are stale.

Like a global variable, the value(s) in a database can be changed once they are unlock, but there are many ways to control the access (Don't give all the devs the password to the account allowed to change data.). If you have a variable that has limited access, it's not very global.

JeffO
  • 36,816
  • 2
  • 57
  • 124
0

There are several differences:

  • A database value can be modified on the fly. The value of a global that is set in code on the other hand, cannot be changed unless you redeploy your application and modify your code. In fact, this is intentional. A database is for values that might change over time, but global variables should only be for things that will never change and when they do not contain actual data.

  • A database value (row,column) has a context and a relational mapping in the database. This relation can be easily extracted and analysed using tools like Jailer (for instance). A global variable on the other hand, is slightly different. You can find all the usages, but it would be impossible for you to tell me all the ways in which the variable interacts with the rest of your world.

  • Global variables are faster. Getting something from a database requires a database connection to be made, a select to me run and then the database connection must be closed. Any type conversions you might need come on top of that. Compare that to a global being accessed in your code.

These are the only that I can think of right now, but I'm sure there are more. Simply put, they are two different things and should be used for different objectives.

Arnab Datta
  • 191
  • 8
0

Of course globals are not always inappropriate. They exist because they have a legitimate use. The main problem with globals, and the primary source of the admonition to avoid them, is that code that uses a global is attached to that one and only one global.

For example, consider an HTTP server storing the server name.

If you store the server name in a global, then the process cannot concurrently run logic for two different server names. Perhaps the original design never contemplated running more than one server instance at a time, but if you later decide you want to do that, you simply can't if the server name is in a global.

By contrast, if the server name is in a database, there is no problem. You can simply create one instance of that database for each instance of the HTTP server. Because each instance of the server has its own instance of the database, it can have its own server name.

So the primary objection to globals, there can be only one value for all code that accesses that global, does not apply to database entries. The same code can easily access distinct database instances that have different values for a particular entry.

David Schwartz
  • 4,676
  • 22
  • 26
0

I think this is an interesting question but it's a little difficult to answer because there are two main issues that are being conflated under the term 'global state'. The first is the concept of 'global coupling'. The proof of that is that the alternative given for global state is dependency injection. The thing is that DI doesn't necessarily eliminate global state. That is, it's absolutely possible and common to inject dependencies on global state. What DI does is remove the coupling that comes with global variables and the commonly used Singleton pattern. It aside from a slightly less obvious design, there's very little downside to eliminating this kind of coupling and the benefits of eliminating the coupling increases exponentially with the number of dependencies on those globals.

The other aspect of this is shared state. I'm not sure if there's a really clear distinction between globally shared state and shared state in general but the costs and benefits are much more nuanced. Simply put there are innumerable software systems that require shared state to be useful. Bitcoin, for example, is a very clever way of sharing state globally (literally) in a decentralized manner. Sharing mutable state properly without creating huge bottlenecks is difficult but useful. So if you don't really need to do it, you can simplify your application by minimizing shared mutable state.

So the question of how databases differ from globals is also bifurcated across these two aspects. Do they introduce coupling? Yes, they can but it depends a lot on how the application is designed and how the database is designed. There are too many factors to have a single answer to whether databases introduce global coupling without details of the design. As to whether they introduce sharing of state, well, that's kind of the main point of a database. The question is whether they do it well. Again, I think this is too complicated to answer without a lot of other pieces of information such as the alternatives and many other trade-offs.

JimmyJames
  • 24,682
  • 2
  • 50
  • 92
0

I would think about it slightly differently: "global variable" like behavior is a price paid by database administrators (DBAs) because it is a necessary evil to do their job.

The issue with global variables, as several others have pointed out, is not an arbitrary one. The issue is that their use makes the behavior of your program less and less predictable because it becomes harder to determine who is using the variable and in what way. This is a big issue for modern software, because modern software is typically asked to do many many flexible things. It may do billions or even trillions of complex state manipulations during the course of a run. The ability to prove true statements about what that software will do in those billions or trillions of operations is extremely valuable.

In the case of modern software, all of our languages provide tools to assist in this, such as encapsulation. The choice not to use it is needless, which leads to the "globals are evil" mentality. In many regions of the software development field, the only people using them are people who don't know how to code better. This means they not only are trouble directly, but they indirectly suggest the developer did not know what they were doing. In other regions, you'll find globals are totally normal (embedded software, in particular, loves globals, partially because they work well with ISRs). However, amidst the many software developers out there, they are the minority voice, so the only voice you hear are "globals are evil."

Database development is one of those minority voice situations. The tools needed to do DBA work are very powerful, and their theory is not rooted in encapsulation. To eek out every single jiffy of performance out of their databases, they need full unfettered access to everything, similar to globals. Wield one of their monsterous 100 million row (or more!) databases, and you'll appreciate why they don't let their DB engine hold any punches.

They pay a price for that, a dear price. DBAs are forced to be almost pathological with their attention to detail, because their tools don't protect them. The best they have in the way of protection is ACID or perhaps foreign keys. Those that are not pathological find themselves with an utter mess of tables that is completely unusable, or even corrupt.

It's not uncommon to have 100k line software packages. In theory, any line in the software may affect any global at any point in time. In DBAs, you never find 100k different queries that can modify the database. That would be unreasonable to maintain with the attention to detail needed to protect you from yourself. If a DBA has anything large like that, they will intentionally encapsulate their database using accessors, sidestepping the "global like" issues, and then do as much work as they possibly can through that "safer" mechanism. Thus, when push comes to shove, even the database people avoid globals. They simply come with a lot of danger, and there are alternatives that are just as strong, but not as dangerous.

Would you rather walk around on broken glass, or on nicely swept sidewalks, if all other things are equal? Yes, you can walk on broken glass. Yes, some people even make a living doing it. But still, just let them sweep the sidewalk and move on!

Cort Ammon
  • 10,840
  • 3
  • 23
  • 32
0

I think the premise is false. There's no reason a database needs to be "global state" rather than a (very large) context object. If you're binding to the particular database your code is using via global variables or a fixed global database connection parameters, it's no different, and no less evil, than any other global state. On the other hand, if you properly pass around a context object for the database connection, it's just big (& widely used) contextual state, not global state.

Measuring the difference is easy: could you run two instances of your program logic, each using its own database, in a single program/process without making invasive changes to the code? If so, your database is not really "global state".

-1

Globals are not evil; they are simply a tool. MISUSE of globals is problematic, as is the misuse of any other programming feature.

My general recommendation is that globals should only be used in situations that are well understood and thought out, where other solutions are less optimal. Most importantly, you want to ensure that you have well documented where that global value might be modified, and if you are running multithreaded, that you are ensuring that global and any co-dependent globals are access in a way that is transactional.

Byron Jones
  • 137
  • 2
-2

Read-Only pattern, and assume your data is not up to date when you print it. Queue writes or handle conflicts another way. Welcome in hell devil, you are using global db.

Vince
  • 117
  • 2