21

Something which has always confused me is this. I keep hearing about how these big, old corporations which were around in the 1950s (for example) and early on started using COBOL-coded business logic on IBM mainframes, which are apparently unable to "migrate to something modern", even though they want to and it's expensive to maintain COBOL programs for various reasons.

First of all, I, probably more than anyone, like the idea of old, big computers chugging away decade after decade. It makes me feel cozy somehow. But this is not about me having a fascinating of old computers and stable software, but simply me wondering about the business need to keep running them even if they have established that it's financially problematic, and the company doesn't have a CEO who happens to have a fondness of old computers.

Basically, how can a company have so much "business logic" that they cannot simply hire well-paid experts to re-implement it all as PHP CLI scripts, for example? (No, that's not a joke. I'd like to hear one single valid argument as to why PHP would be unfit to process all the business logic of a major corporation.) But let's not get hung up on PHP, even though I'd be using it. Any "modern" solution that they can maintain for a fraction of the COBOL/mainframe price would do.

Is the answer simply that there is no such thing? All modern software/hardware is unreliable, ever-changing, ever-breaking garbage? Or do they have such extreme amounts of special rules and weird things happening in their "business logic", spanning millions of lines of code, that the sheer amount of work to translate this over to a modern system simply costs too much? Are they worried that there will be mistakes made? Can't they do this while keeping the old system and run both at the same time for a long time and compare the output/result, only replacing the old one when they have established that the new one works identically for years and years?

I don't understand why they would insist on COBOL/mainframes. I must be grossly underestimating the kind of "code" that exists in a big, old company.

Do they really have zillions of special rules such as:

If employee #5325 received a bonus of over $53 before the date 1973-05-06, then also update employee #4722's salary by a percentage determined by their performance score in the last month counting from the last paycheck of employee #532

? I almost can't believe that such fancy, intricate rules could exist. I must be missing something.

user15080516
  • 319
  • 1
  • 4
  • 9
    If it ain't broke then don't "fix" it. Seriously. It's not just about the business logic. Some organizations are perfectly happy paying to keep old systems running and not having to deal with a huge migration project that could fail catastrophically. https://builtin.com/software-engineering-perspectives/why-cobol-is-still-used – Dan Wilson Apr 09 '21 at 17:53
  • 3
    Heck, my little community college math department has to keep track of every class and every sequence of prerequisites as they ever historically existed, because the college dictates any student be held to the requirements as they existed at first entry to the college. (And some students started 20 or 30 years ago and then came back.) I've seen the flowchart, and it's wild. – Daniel R. Collins Apr 10 '21 at 02:13
  • 1
    I see "rewrite it in Rust" all-the-time these days, but "rewrite it in PHP"? Now that's interesting :-P – Joshua Grosso Reinstate CMs Apr 10 '21 at 06:33
  • 7
    I think your fundamentally flawed assumption is that there *exists* a coherent spec for their codebase from which they could rewrite. If you've ever visited old code (even your own), it's often impossible to tell just by looking whether some edge case in fact needs to be handled in a particular way or not. All you know is some intersection of some constraints results in some complex behavior. Do you keep that complexity when moving, or no? If you keep it, then you're not shedding the complexity you presumably hoped to. If you don't keep it, you risk breaking something. What do you do? – user541686 Apr 10 '21 at 07:15
  • @DanielR.Collins - but it is "broke". It's unmaintainable and unmodifiable. And at some point in time, it will fail for one reason or another, and then you are fucked because you don't have a plan to fix it. – Davor Ždralo Apr 10 '21 at 07:38
  • 3
    The issue with these systems is that no human is able to fully understand the function nor business requirements well enough to rewrite, which means the system and the business are essentially the same thing at this point. It ceases to be viable to rewrite without remodelling the entire business from scratch, so the issue isn't really about the system, it's that the only viable 'rewrite' option means decommissioning all existing products/services (i.e. the very thing which makes the business profitable), then inventing new similar ones and hoping that customers will stick around. – Ben Cottrell Apr 10 '21 at 08:56
  • The company simply did the cost/benefit analysis, and decided that yes, it *is* that more expensive to rewrite it, however it may be expensive to maintain the existing code. As for old computers running the code, the issue can be solved in alternative way: use something like [Visual COBOL](https://www.microfocus.com/en-us/products/visual-cobol/overview) and run it on the new servers. – milleniumbug Apr 10 '21 at 09:58
  • 4
    @BenCottrell, I think you make one of the key points there. Often in the distant past, the business first had general staff who had spent years learning to deeply understand the manual systems which preceded computerisation, and then for a while after the execution of computerisation, the business had IT staff who broadly understood it all and continued to provide oversight until the end of their careers. Decades later, the business has absolutely neither - less expertise even than before it was computerised, let alone what it had at the zenith that gave birth to its current system. – Steve Apr 10 '21 at 11:13
  • 2
    Related news story: A few days before this question got posted, IBM announced a [COBOL compiler for Linux on x86 to support running things on their hybrid cloud](https://www.theregister.com/2021/04/07/ibm_cobol_x86_linux/). – Daniel R. Collins Apr 10 '21 at 16:20
  • 2
    You severely underestimate the value of battle tested production code. – Thorbjørn Ravn Andersen Apr 10 '21 at 23:41
  • I know you like the idea of "old, big computers chugging away decade after decade", but I suspect that the actual hardware for most of these systems is relatively modern hardware that the COBOL systems have been copied onto. IBM still makes new mainframes. Also, the COBOL language was updated as recently as 2014: https://en.wikipedia.org/wiki/COBOL#COBOL_2014 – FrustratedWithFormsDesigner Apr 12 '21 at 15:27

5 Answers5

29

There are two huge factors at play. The question as asked deals with the complexity and risk of replacement. Let's set that aside for a moment. The first problem is simply one of value. Let's say that you can summarize an entire organization's needs in 20 modules. Let's further assume that each module costs around $1 million to build. This is actually pretty cheap, in the broad scheme of things. This would imply that even if the risk is zero, replicating functionality is expensive and time consuming. The fictional $20 million would be spent replicating features that already exist and already keep the lights on. Old code doesn't rust out and become intrinsically less valuable. That $20 million could build net new or even just give everyone a bonus.

This really is the big reason. You spend a lot of money, time and (in real world development) risk to get back to square one. It is very hard to justify.

Here are some good links about this aspect of code replacement:

The complexity of the rules - and their continuing evolution - contributes to the riskiness of the enterprise.

The easiest way to think of the complexity isn't software, it's actually tabletop gaming.

If you've never done it, go watch veteran players playing Magic: The Gathering. The game is easy to understand. Reduce the other player's health to zero or run them out of cards. Do what the cards tell you and do it in order. That's about all there is to the game. But it can be bewildering to watch.

Why?

Well, the rules are open ended and each new card adds a little bit to the working rule space. Cards have been cycled since 1993. That's 28 years as of this writing. The space is huge and the interactions are innumerable (literally - Magic is Turing Complete https://arxiv.org/abs/1904.09828).

Business rules are the same way. These companies have been running for decades (or centuries, in some cases). They started off simply. But then they added rules for new product lines, to comply with regulatory requirements, to handle edge cases the original developers missed, to handle changes in policy, etc.

Year by year, the changes are digestible. Like looking at a yearly set of new cards in Magic. But trying to look at the full cycle is daunting. The problem space is simply enormous. Patching existing code is like learning this year's set. Trying to rewrite in a new language is like trying to understand all of the cards ever released.

Worse still, there is no individual who understands it all. The developers have been focusing on a finite subset of the 20 modules. Development staff has come and gone. Business staff has come and gone. And they are all focused on their piece. Assembling the mosaic to see the entire problem space is a research project by itself.

The new languages can handle the requirements and all the fiddly bits. The COBOL system could be rewritten in PHP. But that takes time and money and the new language won't do a single thing to tell you what it is you need to write.

So we find ourselves back where we started. To paraphrase Ecclesiastes, "generations come and go, but the mainframe abides."

Michael
  • 6,437
  • 2
  • 25
  • 34
  • 6
    That's a nice one. Maybe it's worth to add a reference to this great article: https://medium.com/the-technical-archaeologist/is-cobol-holding-you-hostage-with-math-5498c0eb428b on COBOL and maths issues (i.e. doing reliable fixed point arithmetic with high performance). – Christophe Apr 09 '21 at 18:29
  • 2
    Can new languages EASILY handle the requirements? COBOL after all was designed to be a business oriented language. Admittedly, business is not the space I work in, but I don't know of any purely business-oriented "modern" languages. Just as it's often easier to write engineering/scientific code in Fortran or C, than in more modern languages that can force you to jump through all sorts of hoops to do simple things. Nor is COBOL limited to mainframes. Gnu COBOL has been around for a while, and IBM just introduced an X86 COBOL compiler: https://www.ibm.com/products/cobol-compiler-linux-x86 – jamesqf Apr 10 '21 at 03:57
  • One caveeat in the "calculation": Code does not rust, and yet it does. If you don't maintain it, it turns into a mystery black box and you will have no clue how to fix things if some piece of hardware fails, or when it encounters a limitation (year 2000 bug^^) and how to build a maintainable fix etc. So unless you are fine with running it until it breaks and then throwing it in the bin, you either pay for maintenance or you have to pay that amount x10 when something breaks/you need something in addition. Might still be cheaper than a rewrite, but is often overlooked in such calculations...^^ – Frank Hopkins Apr 10 '21 at 04:57
  • 8
    better COBOL than PHP – JoelFan Apr 10 '21 at 05:31
  • 2
    Your $20 million is an underestimate. Companies have spent hundreds of millions of dollars, in some cases close to a billion (e.g., Bank of Australia), to transition from COBOL to a more modern language. It is an expensive process, and the benefit is not that great. While I've never written a line of COBOL, I do understand its longevity. Your second link to a Joel On Software article sums it up: Netscape screwed up "by making the single worst strategic mistake that any software company can make: They decided to rewrite the code from scratch." – David Hammen Apr 10 '21 at 08:42
  • I'd like to imagine somewhere out there, there's some business logic that's Turing complete w.r.t. employee's salaries. Bob can't get a raise because Alice's salary is undecidable, oh well. – Passer By Apr 10 '21 at 11:17
  • 3
    @FrankHopkins, any analogy that "code rusts" fails to capture that the problem is not that code flakes, crumbles, and eventually gives way (as would rusting iron), but that it seemingly becomes firmer, stronger, and more unyielding to the forces humans apply - *more* like good iron, in fact. Unmaintainable code has not rusted. It is our knowledge of the ontology which has atrophied - it's more like cerebral palsy or multiple sclerosis, as the controlling connection between our brains and the machine is lost. – Steve Apr 10 '21 at 12:39
  • @Steve yes and no, it does not change, but its surroundings may. Including the platform it has to run on etc. And sure thing the analogy is not exact, but people using it typically know that part and the meaning is more that the project/application cannot be handled as good as before and the likelihood that it crumbles into pieces when you give it a kick increases. The overall "thing" gets more instable and unpredictable like a rusty tanker where you simply don't know which wall you can kick in and which not because it may cascade. But yeah, some coders tend to take things literally ;) – Frank Hopkins Apr 10 '21 at 17:16
  • 1
    To add to @Steve's point about the context disappearing: Peter Naur makes a very similar point in his paper "Programming as Theory Building" (http://pages.cs.wisc.edu/~remzi/Naur.pdf). – Michael Oct 18 '21 at 21:46
19

One thing about COBOL in particular is that its built-in decimal data type is perfect for money calculations; PHP doesn't have that. Extrapolating a little, some of the old platforms have useful features that are hard and expensive to give up. (For example, nothing in my experience matches the elegance of the L-indicators in RPG II for building hierarchical reports with subtotals.)

But you are on the right track with your broader question:

Or do they have such extreme amounts of special rules and weird things happening in their "business logic", spanning millions of lines of code, that the sheer amount of work to translate this over to a modern system simply costs too much? Are they worried that there will be mistakes made?

Basically yes—the special rules are often very intricate and more importantly they are frequently documented only in the code itself. For example, a few years ago I ran across a financial calculation that took into account a subset of the company's product line and excluded certain other specific products, by item number. And the imputed tax calculation depended partly on the customer category and partly on a date window (that was hardcoded) and... wow. Yeah. What you said.

And these business rules are not stated anywhere in a requirements or support document, in this case, for this one relatively minor report, the rules were interwoven in five thousand lines of T-SQL database code. (So not as crusty as COBOL from the 1960s by any means.)

Further, it's not uncommon for business rules to be buried implicitly in the data. Such as, my first job out of university where calculations that were no longer in current code had produced important data values that had to be interpreted in a certain context that you could not reverse engineer from the code base.

That process of reverse engineering doesn't make the legacy system impossible to replicate, but it means that extracting requirements is a gigantic undertaking. Writing the new code is the easy part!

So the simple answer is yes. In many enterprises, they really do have zillions of special rules like that.

catfood
  • 1,615
  • 1
  • 9
  • 13
  • 9
    This, 100 times. The business rules are in _code_ not any spec. There's no way to pull them out of the code. You can't even _identify_ them in the code: There's no construct in COBOL or any other language that says "this line of code is a business rule". They're indistinguishable, in code, from any other conditional expression, or loop, or whatever. Everything is all blended together. And then, even if you _could_ identify a "business rule" - for many of them there's nobody left at your company to tell you _why_ it is (or was ever) a rule. That "tribal knowledge" was long lost. – davidbak Apr 09 '21 at 18:30
  • 1
    With the build-in (binary coded) decimal type, you mean probably this kind of issues: https://medium.com/the-technical-archaeologist/is-cobol-holding-you-hostage-with-math-5498c0eb428b – Christophe Apr 09 '21 at 18:33
  • 2
    Building on your answer, I suspect because of the organic nature of the code and the unstructured nature of COBOL it is probably very difficult to figure out which variables constitute part of the API of a given 'module' and which are internal. In fact I suspect this distinction doesn't exist, even conceptually. There are probably variables that were originally intended to be private that now form a vital input to some newer module that's been superglued on, which in turn feeds 15 other modules etc. I've never worked with this kind of mainframe code, but I've experienced similar things. – Simon Notley Apr 09 '21 at 21:16
  • 4
    Did you mean to say, "these business rules are **not** stated anywhere in a requirements or support document"? – Daniel R. Collins Apr 10 '21 at 02:18
  • 1
    Actually you give really good arguments for a rewrite^^ If the users don't understand how the software calculates critical data it is high time for a rewrite/redesign with the added feature to make the functionality really transparent. But yes, not exactly the decision process of many companies - until they get into trouble because stuff magically happens that should not happen or they didn't think would happen^^ – Frank Hopkins Apr 10 '21 at 05:02
  • 1
    @DanielR.Collins Yes, thanks. – catfood Apr 10 '21 at 18:40
  • @FrankHopkins I'm giving really good arguments for the issue being not so much "are the newer environment and tooling that much better?" but "how well do we need to understand the rules embedded in the software?" In a lot of shops the answer is "Not that much, as long as stuff still works!" – catfood Apr 10 '21 at 18:42
  • @catfood take it with a grain of irony. My point is not to contradict your answer, just make the fact that the cost of not knowing what your system is doing is often overlooked until it really bites you back. Short term thinking or happy path thinking means you will keep it running because it "works" and build any new functionality around it. Sometimes that IS a valid approach but sometimes that is just the approach taken because it looks cheap now and the future cost is not considered - and eventually a "cleanup" becomes immensely costly due to accrued interest. – Frank Hopkins Apr 10 '21 at 18:51
  • @catfood (Cleanup doesn't need to mean rewrite in new language^^). Good answer in general with valid points. I'm just saying that part of the reasons given aren't set in stone but symptoms of deeper problems/decisions, same as you put it in your comment in essence I guess. So maybe, you can put that "deeper reason" as an addition into the answer, if you like. Cheers anyway. – Frank Hopkins Apr 10 '21 at 18:54
15

(Koff, koff ...) Speaking as someone who used to teach a community-college course in COBOL ... there are actually key attributes about that particular language which have never been replicated since.

Buried in its "PIC[TURE] clause," this language has very precise control over "dollars and cents." Whereas nearly every other language resorts to "float binary," COBOL did not. It supports "BCD = Binary-Coded Decimal" representations, and it can carry them through an arbitrarily-large series of mathematical operations with an entirely predictable result. "Right down to the penny."

Even though the COBOL language is "these days, perhaps strange to look at," it continues to be very well-supported and ... "it continues to move the freight." There actually is a very legitimate reason why it has not been replaced. I strongly encourage you to spend the time to become familiar with it.

Mike Robinson
  • 1,765
  • 4
  • 10
  • 4
    Many modern languages come with support for BCD or it's functional equivalent, it's just not the default int/float type. – Davor Ždralo Apr 10 '21 at 07:41
  • @DavorŽdralo, I think that's precisely the point: Cobol is a language which, *by default*, reproduces the way arithmetic is performed by hand by accountants and by others in typical business contexts. Floating point arithmetic is not even on school curriculums, and yet for some reason finds itself as the default representation of decimal numbers in most computer languages. – Steve Apr 10 '21 at 13:02
  • There's plenty of languages that allows you to define a BCD type and use them as drop in replacements for floating types. You're saying COBOL is used because companies wouldn't hire someone to write a pretty darn simple library? – Passer By Apr 10 '21 at 14:22
  • 1
    There are modern languages that can do both float and decimal arithmetic. C# has a decimal data type available, which is slower than floating point arithmetic but more accurate for dollars-and-cents. I've seen that used for medical and financial applications, because unlike a binary format, when you enter 0.3, the system stores 0.3 precisely. – Robyn Apr 11 '21 at 02:26
  • @PasserBy, Cobol surely continues to be used because you *don't even have to* write a library to adapt some other ill-fitting language to handle simple arithmetic? You can do almost anything in any language with enough ceremony and boilerplate, but there is always great value in not having to do so. – Steve Apr 11 '21 at 10:17
  • @Steve There is no great value, there's the exact cost of hiring someone to write that library. It's that versus all the problems of COBOL. You can't in fact do almost everything in every language, there's plenty of legitimately difficult problems. A drop in replacement arithmetic type is not one of them, especially in languages with operator overloading. – Passer By Apr 11 '21 at 10:39
  • @Steve - by "default" I mean default for programmers to use. As mentioned, C# for example has both and doesn't care which you use. – Davor Ždralo Apr 11 '21 at 11:55
10

Based on what I have seen here are the main factors:

  • There is little to no documentation for these systems. There's no one to ask about requirements
  • COBOL is not a structured language. There's a lot of code written in it that is extremely hard to reason about.
  • These implementations generally lack modularity. You have to replace the whole thing in one shot

If it's not obvious, the first two items are related. Since there's no documentation of the requirements, if you want to rewrite the system, you need to make sense of the code. I've never really coded in COBOL but I had to learn how to read it at one point. We take a lot of thing for granted in structured languages. For example, you can't jump (GOTO) from the middle of one loop into the middle of a completely independent loop and back. I spent many hours trying to make sense of code like that once. I was only able to determine that it probably wasn't implementing the correct logic, I gave up on trying to figure out what is did in general. I've also seen a lot of weird things like 'programs' that were made up of a long series of COBOL procedures that each read in a file and then added a single field to it and wrote it out so it could then be processed by the next procedure. I'm sure a lot of COBOL people would say this is just bad programming but the reality is a lot of this old code is a huge mess.

Even if a company takes on this effort, the next thing is how do you test it? Guaranteed, your PHP (why?) implementation will contain bugs. How will you find them? Who is testing this and how do they know what the requirements are?

In reality, rewriting something like this is basically brand new implementation. The old system can be used a a reference but it can be extremely time consuming to understand.

A lot of companies do this. Some have spent inordinate amounts of money e.g. trying to replace them with an ERP. It's a cost-benefit analysis and some companies have decided not to try. As time goes on, though, that analysis changes or the company goes belly up, gets bought by another company, etc.

JimmyJames
  • 24,682
  • 2
  • 50
  • 92
2

I refrain from the discussion whether a legacy system should not better be maintained or migrated.

Let us first see what migration means.

A similar scenario. A large software system in the script language Perl, with scripts of thousands lines of code. And data structures only arrays and associative arrays. Converted to Java.

The correct way is:

  • start with Java code emulating Perl functionality
  • create an imperfect Perl-to-java converter
  • have a dev and staging system
  • a proof of concept of first pieces
  • use a team to manually analize data types and analize/correct the generated code
  • keep progress statistics from the start
  • add emulation for things like report generation, printing and so on

This allows that ready code can already be executed from the beginning, with a first proof-of-concept of a tiny piece of the system. At the end you have a pure Java system and Java lends itself very good for large refactorings.

For Cobol instead of Perl it is a bit harder, as it is an almost like building a Cobol machine and it is somewhat more difficult to separate small units. But after completion of Cobol and database migration to a Java system, the system can be analized much better, and incrementally refactored.

The time needed is large, the success of migration w.r.t. completion and performance is a risk. And after completion at least some (incremental) refactoring is needed. Like making the software more tracable: adding form IDs & version to generated reports; logging, unit tests.

So it can be done, but the developer(s), good in Java, must know how Cobol works. And the team mainly for the emulation must be small. The team for manually completing the code can be larger.

The costs will be high.

I hope (with this not very well written answer) to show the magnitude involved. You do not want to do this with an unknown team.

Joop Eggen
  • 2,011
  • 12
  • 10