How do you safely refactor in a language with dynamic scope?

Question

For those of you who have the good fortune not to work in a language with dynamic scope, let me give you a little refresher on how that works. Imagine a pseudo-language, called "RUBELLA", that behaves like this:

function foo() {
    print(x); // not defined locally => uses whatever value `x` has in the calling context
    y = "tetanus";
}
function bar() {
    x = "measles";
    foo();
    print(y); // not defined locally, but set by the call to `foo()`
}
bar(); // prints "measles" followed by "tetanus"

That is, variables propagate up and down the call stack freely - all variables defined in foo are visible to (and mutatable by) its caller bar, and the reverse is also true. This has serious implications for code refactorability. Imagine that you have the following code:

function a() { // defined in file A
    x = "qux";
    b();
}
function b() { // defined in file B
    c();
}
function c() { // defined in file C
    print(x);
}

Now, calls to a() will print qux. But then, someday, you decide that you need to change b a little bit. You don't know all the calling contexts (some of which may in fact be outside your codebase), but that should be alright - your changes are going to be completely internal to b, right? So you rewrite it like this:

function b() {
    x = "oops";
    c();
}

And you might think that you haven't changed anything, since you've just defined a local variable. But, in fact, you've broken a! Now, a prints oops rather than qux.

Bringing this back out of the realm of pseudo-languages, this is exactly how MUMPS behaves, albeit with different syntax.

Modern ("modern") versions of MUMPS include the so-called NEW statement, which allows you to prevent variables from leaking from a callee to a caller. So in the first example above, if we had done NEW y = "tetanus" in foo(), then print(y) in bar() would print nothing (in MUMPS, all names point to the empty string unless explicitly set to something else). But there is nothing that can prevent variables from leaking from a caller to a callee: if we have function p() { NEW x = 3; q(); print(x); }, for all we know, q() could mutate x, despite not explicitly receiving x as a parameter. This is still a bad situation to be in, but not as bad as it probably used to be.

With these dangers in mind, how can we safely refactor code in MUMPS or any other language with dynamic scoping?

There are some obvious good practices for making refactoring easier, like never using variables in a function other than those you initialize (NEW) yourself or are passed as an explicit parameter, and explicitly documenting any parameters that are implicitly passed from a function's callers. But in a decades-old, ~10⁸-LOC codebase, these are luxuries one often does not have.

And, of course, essentially all good practices for refactoring in languages with lexical scope are also applicable in languages with dynamic scope - write tests, and so forth. The question, then, is this: how do we mitigate the risks specifically associated with the increased fragility of dynamically-scoped code when refactoring?

(Note that while How do you navigate and refactor code written in a dynamic language? has a similar title to this question, it is wholly unrelated.)

related (possibly a duplicate): [Is there a correlation between the scale of the project and the strictness of the language?](http://programmers.stackexchange.com/questions/209376/is-there-a-correlation-between-the-scale-of-the-project-and-the-strictness-of-th) — gnat, Sep 12 '15 at 13:42
@gnat I'm not seeing how that question / its answers are relevant to this question. — senshin, Sep 12 '15 at 13:46
'Very large, long-living projects can "afford" different test development process, with production quality test suites, professional test dev teams and other heavyweight stuff...' — gnat, Sep 12 '15 at 13:47
@gnat That statement is certainly true, but what does that have to do with the specific refactoring issues that arise when using a dynamically-scoped language? — senshin, Sep 12 '15 at 13:49
this answers the question asked: "how do we mitigate the risks specifically associated with the increased fragility of dynamically-scoped code when refactoring?" — gnat, Sep 12 '15 at 16:34
@gnat Are you saying that the answer is "use different processes and other heavyweight stuff"? I mean, that's probably not wrong, but it's also over-general to the point of not being particularly useful. — senshin, Sep 12 '15 at 16:43
if your project is large then yes, the answer is this. If it's small, you just do some extra tests for stuff that would be "tested for granted" in stricter languages. Thing to watch out is, as explained in the answer over there, "much of the productivity gains we saw were lost in test writing" — gnat, Sep 12 '15 at 16:46
Honestly, I don't think there is an answer to this other than "switch to a language where variables actually have scoping rules" or "use the bastard stepchild of Hungarian notation where every variable is prefixed by its file and/or method name rather than type or kind". The issue you describe is just *so terrible* I can't imagine a *good* solution. — Ixrec, Sep 12 '15 at 18:27
I am fully with Ixrec - if you cannot introduce local variables into a function without a high risk of getting unwanted side effects, the language is crap. Use a different language. — Doc Brown, Sep 12 '15 at 21:40
@DocBrown Obviously nobody's going to do greenfield development in MUMPS (dear god, at least I hope not), but if you're stuck with an enormous codebase, you've got to make do with what you've got. (Aside: porting off of MUMPS in particular is further complicated by the fact that MUMPS is _also_ a database system - so even if you port the procedural stuff to another language, you're still going to need some amount of core logic/bindings in MUMPS anyway.) — senshin, Sep 12 '15 at 22:03
At least you can't accuse MUMPS of false advertising for being named after a nasty disease. — Carson63000, Sep 13 '15 at 00:25

thepacker · Answer 1 · 2015-09-12T21:00:27.540

Wow.

I do not know MUMPS as a language, so I do not know whether my comment applies here. Generally speaking - You must refactor from inside out. Those consumers (readers) of global state (global variables) must be refactored into methods/ functions/procedures using parameters. The method c should look like this after refactoring:

function c(c_scope_x) {
   print c(c_scope_x);
}

all usages of c must be rewritten into (which is a mechanical task)

c(x)

this is to isolate the "inner" code from the global state by using local state. When you are done with that, you will have to rewrite b into:

function b() {
   x="oops"
   print c(x);
}

the x="oops" assignment is there to keep the side effects. Now we must consider b as polluting the global state. If you only have one polluted element consider this refactoring:

function b() {
   x="oops"
   print c(x);
   return x;
}

end rewrite each use of b with x=b(). Function b must use only methods already cleaned up (you may want ro rename c o make that clear) when doing this refactoring. After that you should refactor b to not pollute the global environment.

function b() {
   newvardefinition b_scoped_x="oops"
   print c_cleaned(b_scoped_x);
   return b_scoped_x;
}

rename b to b_cleaned. I guess you will have to play a bit with that to get accoustomed to that refactoring. Sure not every method can be refactored by this but you will have to start from the inner parts. Try that with Eclipse and java (extract methods) and "global state" a.k.a. class members to get an idea.

function x() {
  fifth_to_refactor();
  {
    forth_to_refactor()
    ....
    {
      second_to_refactor();
    }
    ...
    third_to_refactor();
  }
  first_to_refactor()
}

hth.

Question: With these dangers in mind, how can we safely refactor code in MUMPS or any other language with dynamic scoping?

Maybe someone else can give a hint.

Question: How do we mitigate the risks specifically associated with the increased fragility of dynamically-scoped code when refactoring?

Write a program, which does the safe refactorings for you.
Write a program, which identifis safe candidates / first candidates.

Ah, there is one MUMPS-specific obstacle to trying to automate the refactoring process: MUMPS does not have first-class functions, nor does it have function pointers or any similar notion. Which means that any large MUMPS codebase will inevitably have _lots_ of uses of eval (in MUMPS, called `EXECUTE`), sometimes even on sanitized user input - which means that it can be impossible to statically find and rewrite all the usages of a function. — senshin, Sep 12 '15 at 21:53
Okay consider my answer as not adequate. A youtube video i think refactoring@google scale did a very unique approach. They used clang to parse an AST and then used their own search engine to find any (even hidden usage) to refactor their code. This could be a way fo find every usage. I mean a parse and search approach on mumps code. — thepacker, Sep 12 '15 at 22:05

Doc Brown · Answer 2 · 2017-08-26T06:15:55.357

I guess your best shot is to bring the full code base under your control, and make sure you have an overview about the modules and their dependencies.

So at least you have a chance of doing global searches, and have a chance to add regression tests for the parts of the system where you expect an impact by a code change.

If you do not see a chance to accomplish the first, my best advice is: do not refactor any modules which are reused by other modules, or for which you do not know that others rely on them. In any codebase of a reasonable size the chances are high you can find modules on which no other module depends. So if you have a mod A depending on B, but not vice versa, and no other module depends on A, even in a dynamically scoped language, you can make changes to A without breaking B or any other modules.

This gives you a chance to replace the dependency of A to B by a dependency of A to B2, where B2 is a sanitized, rewritten version of B. B2 should be a newly written with the rules in mind you mentioned above to make the code more evolvable and easier to refactor.

This is good advice, though I will add as an aside that this is inherently difficult in MUMPS since there is no notion of access specifiers nor any other encapsulation mechanism, meaning that the APIs we specify in our codebase are effectively just suggestions to consumers of the code about which functions they _ought_ to call. (Of course, this particular difficulty is unrelated to dynamic scoping; I'm just making a note of this as a point of interest.) — senshin, Sep 14 '15 at 03:55
After reading [this article](http://thedailywtf.com/articles/A_Case_of_the_MUMPS), I am sure I do not envy you for your task. — Doc Brown, Sep 14 '15 at 17:25

score 1 · Answer 3 · answered Sep 16 '15 at 06:22

To state the obvious: How to do refactoring here? Proceed very carefully.

(As you've described it, developing and maintaining the existing code base should be difficult enough, let alone attempting to refactor it.)

I believe I would retroactively apply a test-driven approach here. This would involve writing a suite of tests to ensure the current functionality remains working as you start refactoring, firstly just to make the testing easier. (Yes, I expect a chicken and egg problem here, unless your code is modular enough already to test without changing it at all.)

Then you can proceed with other refactoring, checking that you haven't broken any tests as you go.

Finally, you can start writing tests that expect new functionality and then write the code to make those tests work.

How do you safely refactor in a language with dynamic scope?

3 Answers3

Linked