For those of you who have the good fortune not to work in a language with dynamic scope, let me give you a little refresher on how that works. Imagine a pseudo-language, called "RUBELLA", that behaves like this:
function foo() {
print(x); // not defined locally => uses whatever value `x` has in the calling context
y = "tetanus";
}
function bar() {
x = "measles";
foo();
print(y); // not defined locally, but set by the call to `foo()`
}
bar(); // prints "measles" followed by "tetanus"
That is, variables propagate up and down the call stack freely - all variables defined in foo
are visible to (and mutatable by) its caller bar
, and the reverse is also true. This has serious implications for code refactorability. Imagine that you have the following code:
function a() { // defined in file A
x = "qux";
b();
}
function b() { // defined in file B
c();
}
function c() { // defined in file C
print(x);
}
Now, calls to a()
will print qux
. But then, someday, you decide that you need to change b
a little bit. You don't know all the calling contexts (some of which may in fact be outside your codebase), but that should be alright - your changes are going to be completely internal to b
, right? So you rewrite it like this:
function b() {
x = "oops";
c();
}
And you might think that you haven't changed anything, since you've just defined a local variable. But, in fact, you've broken a
! Now, a
prints oops
rather than qux
.
Bringing this back out of the realm of pseudo-languages, this is exactly how MUMPS behaves, albeit with different syntax.
Modern ("modern") versions of MUMPS include the so-called NEW
statement, which allows you to prevent variables from leaking from a callee to a caller. So in the first example above, if we had done NEW y = "tetanus"
in foo()
, then print(y)
in bar()
would print nothing (in MUMPS, all names point to the empty string unless explicitly set to something else). But there is nothing that can prevent variables from leaking from a caller to a callee: if we have function p() { NEW x = 3; q(); print(x); }
, for all we know, q()
could mutate x
, despite not explicitly receiving x
as a parameter. This is still a bad situation to be in, but not as bad as it probably used to be.
With these dangers in mind, how can we safely refactor code in MUMPS or any other language with dynamic scoping?
There are some obvious good practices for making refactoring easier, like never using variables in a function other than those you initialize (NEW
) yourself or are passed as an explicit parameter, and explicitly documenting any parameters that are implicitly passed from a function's callers. But in a decades-old, ~108-LOC codebase, these are luxuries one often does not have.
And, of course, essentially all good practices for refactoring in languages with lexical scope are also applicable in languages with dynamic scope - write tests, and so forth. The question, then, is this: how do we mitigate the risks specifically associated with the increased fragility of dynamically-scoped code when refactoring?
(Note that while How do you navigate and refactor code written in a dynamic language? has a similar title to this question, it is wholly unrelated.)