0

I don't dislike global state, but that could be due to the lack of experience. I was thinking about what the usual implementation of global state is:

A big variable where data flows in a non-consistent, unpredictable but most importantly, non-standardized way, referring to all CRUD operations.

An implementation like global $steps; where you would always do global $steps; $steps['insert']['key_name']['and_value']; to get a value is, in my opinion, why global state is hated and I believe, partially, because people have become a bit too accommodated to objects and for no good reason.

I'll try to naively show how to try to overcome these in order to aid my question.

The problem with just polluting the global state with all kinds of data inside of a variable is that that this way has absolutely not consistent ways to perform CRUD, nor it has rules. Anything can happen, where as objects must respect interfaces, return types and so on, a variable can be...anything. It's extremely hard to predict what goes in and out, or worse yet, in what form.

But what if that global state was accessed in very well-thought, documented & clear ways, so that it's always predictable?

Take my cache object example, that serves as a temporary cache, per PHP request:

class Cache
{
    /**
     * Holds all of our data in a key=>value manner.
     *
     * @var array
     */
    private $data;

    /**
     * Adds data to a key.
     *
     * @param string $key The key, used as an identifier.
     * @param mixed $data The data that we're adding to that identifier.
     */
    public function addData( $key, $data )
    {
        //Should perform integrity checks, etc.
        $this->data[$key][] = $data;
    }

    /**
     * Retrieves data based on a key.
     *
     * @param string $key The key, used as an identifier.
     * @return mixed
     */
    public function getData( $key )
    {
        return $this->data[$key];
    }


    /**
     * Changes data, where data is the value of a key in the big array.
     *
     * @param string $key The key, used as an identifier.
     * @return void
     */
    public function changeData( $key )
    {
        if( checkIntegrityAndOtherStuff( $this->data[$key] ) ) {
            $this->data[$key] = $data;
            return True;
        }

        //If we failed our checks
        return False;
    }
}

It's a very minimal implementation, but, assuming we had strong checks & rules in place (I will come back to this extremely vital point which I think could be the Achilles' hill of it all in a bit), then we have a predictable system that we can refer to:

global $cache = new Cache;

that we can always rely on to work in just one specific way and nothing else:

$cache->addData( 'user_list', [['name' => 'John'], ['name' => 'Jen']] );
$cache->getData( 'user_list' );

We have all the clear signs of good implementation: good naming, predictability, testability, it's concise, but most importantly, easy to use.

So what is wrong here?

The possible Achilles' heel.The one thing that defeats it all could be the fact that you cannot impose any type of contract / pattern on the data being added to these keys, unlike objects where you can set return types / interfaces and know what to expect when you retrieve something, here, you can't, the developer must know beforehand what he's getting, otherwise he's in the dark, with the other, worse side to it that anyone, even if well-intended can change the data contents (and therefore structure) without any consequences, rendering code that relies on it unusable.

If we had things such as "data contracts" that would be bound & required to the data we add (for retrieval later on), then no one could nor add the wrong data type / structure, nor retrieve it, creating a perfectly predictable, well-structured & ruly environment that everyone can benefit from and access.

It might look something like this:

public function addData( $key, $data, DataScheme $data_scheme )
{
    $structure = $data_scheme->getStructure();

    if( dataDoesNotRespectScheme( $data, $data_scheme ) ) {
        //Break, do not allow it.
    }
    $this->data[$key][] = ['data' => $data, 'scheme' => $data_scheme];
}


public function changeData( $key, $new_data )
{

    if( dataDoesNotRespectScheme( $this->data[$key]['scheme'], $new_data ) ) {
        //Fail.
    } else {
        //Add the new data which is 100% identical in scheme to the old one.
    }

    //If we failed our checks
    return False;
}

As such, the developer only has to know about the data structure, but he's 100% guaranteed to get it, in essence, creating a data interface which means that no matter what, code relying on retrieving this saved cannot fail.

Is this what the global state must overcome to be accepted?

lucasgcb
  • 355
  • 4
  • 12
coolpasta
  • 641
  • 5
  • 15
  • What you're describing is a generic repository, not global state. – Robert Harvey Apr 15 '19 at 02:37
  • 1
    How is generic repository, accessible from anywhere, different from global state? Its implementation? Doesn't that make the generic repository the answer to the global state's problems, then? Sorry, I have absolutely no knowledge of CS overall, it just occured to me while abstracting about my code. – coolpasta Apr 15 '19 at 02:38
  • Are you interpreting a database as "global state?" – Robert Harvey Apr 15 '19 at 02:47
  • @RobertHarvey No. I understand the repository pattern deals with storing data to the database, but shouldn't both the cases of memory and database be treated the same in regards to how the data changes (not caring about its implementation). There should be a simple object that has CRUD functionality and another that deals with delegating more functionality once these operations happen. Regardless, is the repository an answer to the global state if it's so much dependant on the database? – coolpasta Apr 15 '19 at 02:50
  • Can you tell me where, in the code you've posted above, you believe you're holding onto global state? – Robert Harvey Apr 15 '19 at 02:51
  • @RobertHarvey Where I attributed `new Cache` to the global `$cache` in order to have it available everywhere, which is the number one reason for creating a global state in the first place. – coolpasta Apr 15 '19 at 02:53
  • So Cache is essentially a big bag of Key/Value pairs? – Robert Harvey Apr 15 '19 at 02:55
  • @RobertHarvey Correct, but this is just an example of what something in the global state might be. Regardless of its identity, I was just offering an implementation to what I often see the global state is used for: a place (be it in memory or database) that you can CRUD from anywhere, at any time, freely. – coolpasta Apr 15 '19 at 02:57
  • OK, but all you've really done is create an in-memory generic repository. All you'd have to do is change the implementation and you'd have a full-fledged database solution. I'd say that's quite a bit more robust and disciplined than some random global variable. – Robert Harvey Apr 15 '19 at 02:58
  • @RobertHarvey DING. And there is my question. Would the graybeards of programming say "yea, dude, this kinda works" when they see it, vs. the classic "man go away" when they see just a global variable where it's a free-for-all? And, again, as per my question: is global state hated for how its being implemented by a large number of developers (free-for-all) or just simply for its nature? I know it's easy to say that there are no bad things, it all depends on how you use it, but have you seen traits in PHP? – coolpasta Apr 15 '19 at 03:01
  • Let's say you have a method that wants 5 key-value pairs from the cache. In order for this data structure to be useful, your code would have to be structured in such a way that these key-value pairs are added to the cache in different locations. If I'm new to the codebase, how am I supposed to know the current state of the cache and whether I'm able to call that method? It would be better to restructure the code so that the method arguments are co-located because it is very difficult to reason about code that uses "state" to overcome poor design. – Jared Goguen Apr 15 '19 at 04:00
  • The difference between the described cache and a DB solution is the schema. By having a formalized data structure, we not longer have to keep track of what data exists and what data has yet to be calculated because the schema does this implicitly. – Jared Goguen Apr 15 '19 at 04:03
  • @JaredGoguen has a fair point about the schema, but the point is that your global state isn't really global; it's sequestered inside a class and has a defined, structured way to access it. That's a far cry from a global variable that has no access control whatsoever. – Robert Harvey Apr 15 '19 at 04:12
  • "unlike objects where you can set return types / interfaces and know what to expect when you retrieve something, " <- You can't do that in PHP, last I checked, so it's not really an argument. You can do it informally with documentation, but you can do the same thing with global variables. The problem with global state isn't that it's unstructured, it's that you can't tell where it's being used. – user253751 Apr 15 '19 at 04:13
  • @immibis: Also a fair point, but you could say the same thing about any class that is being accessed from multiple places. – Robert Harvey Apr 15 '19 at 04:15
  • 1
    A key feature of a cache is that, anything you can do with the cache can be done without the cache, and the cache doesn't change the result at all, only gets it more quickly. So that's not so bad. Read-only configuration data is also not so bad (until it impedes testing). – user253751 Apr 15 '19 at 04:16
  • 1
    Now an example I know of global state being bad: When Minecraft added the Nether (an alternate game world), the code still had a global variable for "the game world". When you travelled to or from the Nether, it would save your entire game and load up an alternate saved game. For obvious reasons, this didn't work on a multiplayer server, so the Nether was unavailable in multiplayer until about 6 months later when they'd done all the needed refactoring to delete the global variable. (It was another year after that before singleplayer games could have both worlds running at the same time) – user253751 Apr 15 '19 at 04:18
  • (in short, requirements might change and then you might need two of the thing you thought you only needed one of) – user253751 Apr 15 '19 at 04:40
  • 3
    Possible duplicate of [Why is Global State so Evil?](https://softwareengineering.stackexchange.com/questions/148108/why-is-global-state-so-evil) – gnat Apr 15 '19 at 17:29
  • @gnat I'm treating the next step after someone has made their decision for it being bad and trying to see how to solve it. Also, most answers from that question speak of exactly the issues I presented and **I tried, with my weak mind, to offer a solution. Make the global an object that has clear rules for usage so that everything is predictable** and was wondering if this was the right approach. – coolpasta Apr 15 '19 at 23:20
  • You're basically making your own definition of the "global state problem." and then go ahead and try solve it. This is misrepresenting all the issues with it. In that sense your question is loaded, because answering it in any meaningful way would be accepting your definition of the "global state problem." – Pieter B Apr 16 '19 at 07:48

2 Answers2

13

Is this what the global state must overcome to be accepted?

No. Adding contracts to what goes into and out of global state is as easy as using a language with real types. There are lots of those and global state is still evil there.

Things you're missing:

  • Who fubar'd my data?!? - the biggest, most obvious problem with global mutable state is that anyone can change it. What happens when a bug pops up because the contents of the data aren't what you expected? Literally any part of your program could be the culprit.
  • I want to reuse this thing. - Oops, you can't. It relies on global mutable state along with the entire rest of your program. That broad coupling tends to make things less modular and encourages people to add functionality to your God Object.
  • Why is this thing slow?!? - Less of a problem in php, but global mutable state is really, really unfriendly to concurrency. That limits scalability and usually performance of that code. And since almost every unit test framework will parallelize test runs, your global mutable state will probably break that too.

(along with a few smaller things)

Telastyn
  • 108,850
  • 29
  • 239
  • 365
  • 6
    Note: Global state won't make your thing slow, but it will stop you from making it fast. – user253751 Apr 15 '19 at 04:26
  • So what would be a better solution to this or is the original intent of "i want to access data everywhere" a bad way of doing things? – coolpasta Apr 15 '19 at 05:15
  • 1
    @immibis it can make it slower because the compiler can't inline some code, prove some code is dead, prove some variable will always be of some runtime type (even if it's closed over, a Javascript variable can only be seen by so much code. If it's accessible from the global scope, nothing can be proven), it may not be able to parallelize your loops... – John Dvorak Apr 15 '19 at 06:29
  • 11
    "is the original intent of "i want to access data everywhere" a bad way of doing things?" - That is the fundamental position that leads to the problems of global state. Accessing data everywhere means that any attempt to reason about one part of the program must include reasoning about every other part of the program. All of software architecture is really about making reasoning easier, and being able to limit your reasoning to part of a program is the most important aspect of that. – Sebastian Redl Apr 15 '19 at 07:04
1

You question actually concern two different things:

  1. Global variables

  2. An untyped key-value store versus a strongly typed repository with input validation, consistency and access control.

Your question is if global variables are considered evil because they are typically used for untyped and unconstrained data.

The answer is no. Global variables are considered bad due to them being global - i.e. directly accessible from anywhere in the program. This creates a tight coupling between all parts of the program which defeats all other architectural constraints like layering and encapsulation.

A global variable can be anything, eg. just a boolean flag. The problem is when this flag can be sat in some component and then affect behavior in some distant, seemingly unrelated component.

Of course using a strongly typed repository have many advantages compared to an untyped store. This is just independent of the question of whether global variables are bad.

JacquesB
  • 57,310
  • 21
  • 127
  • 176
  • The OP isn't really asking whether global state is bad or not... he's asking whether his particular flavor of global state is bad. – Robert Harvey Apr 15 '19 at 14:55
  • @RobertHarvey: Yes, and my answer is that the proposed solution (a well-defined schema and validation of the data added to the global data) does not mitigate the fundamental problems of global muitable state, and therefore it is still bad. – JacquesB Apr 15 '19 at 15:51
  • Would you consider a database equally bad? It's essentially the same thing you've described in your answer. – Robert Harvey Apr 15 '19 at 16:30
  • Not really what I said, but OK. – Robert Harvey Apr 15 '19 at 17:59
  • I mean a database is not in itself bad, the problem is the global variable because it (by definition) is directly accessible from everywhere in the code. – JacquesB Apr 15 '19 at 19:49
  • 1
    @JacquesB And then what is the mechanism to make it available to everyone in the codebase, but make these who use it accountable (as in, we can see who did what and where and make sure they have the right permissions) without injecting the "database query" object to each object that wants to use the database? – coolpasta Apr 15 '19 at 23:21
  • @coolpasta: I'm not sure I understand that question, but I have edited my answer to hopefully make my point clearer. – JacquesB Apr 16 '19 at 09:57
  • @coolpasta with a read-only cache and a good evict algorithm if the data changes periodically. – Laiv Apr 16 '19 at 16:28