What are the three main technologies for a redesign of a COBOL legacy system

Question

There exists an application that reports financial data of a bank to a national bank (all located in Europe). It is a legacy system that is written mainly in COBOL. Only the user interface is written in Java.

The business logic is stored as complex data base tables and entries. Without knowing all internals, the system can hardly be changed according to new financial regulations. Thus, the current system is very hard to debug and unmaintainable.

My aim is now to identify state-of-the art technologies/methods for a redesign and rewrite of the whole system.

What would you consider as the three most important technologies to use for a rewrite of a COBOL System in finance?

Current considerations include:

Using a DSL approach (domain specific language)
Using a modern programming language like Scala
MS OLAP cube (suggested by a colleague, not by me)

You mention technological reasons, but what about business decisions? Those typically constrain your technical decisions. What technologies does your software team know? What hardware systems do you have available? What technologies can your system administrators support? These need to be considered just as much as the technical reasons, and will probably be considered more by the people with the money. — Thomas Owens, Jan 29 '12 at 22:12
@ThorbjørnRavnAndersen: Good question. Last time I did this, however, the last of the folks who wrote the legacy COBOL started quitting the company when they found out about it. — S.Lott, Jan 30 '12 at 10:47
Usually it is much cheaper to add extra stuff to an existing system than to rewrite it. Those in question must have plenty of money. — , Jan 30 '12 at 11:29

score 4 · Answer 1 · answered Jan 29 '12 at 22:20

Test-Driven Reverse Engineering.

There are two parts.

A modern language. C#, Java, Python, Scala. Do not use a DSL. Any DSL you choose simply becomes another COBOL.
A suite of test cases that the original programs appear to pass successfully.

Note that most COBOL systems have a huge number of individual application programs. Almost incomprehensible.

However, the number of programs which do database Create-Update-Delete is considerably smaller.

Step 1 is to partition the space of COBOL programs into two:

Create-Update-Delete. Somewhere near 20%.
Retrieve-only. Somewhere near 80%.

For the 80%, you can use whatever reporting tools appear relevant. Hypercubes or whatever. It doesn't matter, since this is only reporting, and any commodity reporting tool will work. Invest as little time in reporting as humanly possible. Create extract-transform-load (ETL) that populates a simple data warehouse; create metadata for that warehouse that fits a reporting tool. Done.

For the 20%, you have serious, difficult work ahead of you to gather a suite of test cases, confirm that the original COBOL actually processes them as expected, and then writing new code in C#, Java, Python or Scala to implement that programming logic.

You will find that 20% of the Create-Update-Delete programs involve simple business rules that can be easily articulated, fit the regulatory environment, and are easy to implement in Stored Procedures or Java (C# or Python) code. Avoid stored procedures. (Haterz will tell you stored procedures are essential, but can't provide a reason.)

You will find that 80% of the Create-Update-Delete programs involve exceptions, special cases, overrides and incompetent programming too horrifying to examine closely. (It's like being a character in a story by HP Lovecraft; you begin to doubt the very existence of logic or rationality.)

Make the best guess you can at this code and trust your test cases. Time spent creating test cases and samples is more valuable than time spent reverse engineering old COBOL.

COBOL _is_ quite readable though. Helps a bit when designing test cases. — , Jan 29 '12 at 22:31
@ThorbjørnRavnAndersen: Disagree. COBOL *may* be readable. I've seen some very, very bad COBOL. Hope for the best. Plan for the worst. — S.Lott, Jan 30 '12 at 03:10
@S.Lott - That argument goes for pretty much every language out there ... — Rook, Feb 03 '12 at 20:38

score 2 · Answer 2 · answered Jan 29 '12 at 22:10

The business logic is stored as complex data base tables and entries. Without knowing all internals, the system can hardly be changed according to new financial regulations. Thus, the current system is very hard to debug and unmaintainable.

This could mean 3 things: 1-You can't understand the original business requirements 2-You don't understand the current logic in the code 3-Both 1 and 2

If (1) applies, then you can't re-write the system without getting around the problem. Technology would not be your primary issue.

if (2) is the problem, but you understand the business requirements, then you need to consider the following in the new selected technology:

You should try to make your team all take the same language if possible, this makes Java a good component of the technical architecture
You should make sure you can access the mainframe data. I presume that it could be stored on DB2, IMS, VSAM, SAM files and such. Since IBM has a Java compiler, this would make java play a role in the back end solution.
If the application does nothing but reporting (aggregation and summarization) for reporting purposes with some data consolidation, then you are looking for an Edit, Transform, and Load solution. A tool like IBM InfoSphere (DataStage) is ideal if you can afford the price. Other tools also exist. In my humble opinion, I would choose a leading ETL tool over MS OLAP cubes specially that a new release of MS OLAP technology is comming soon. Also, if you use MS, you will definitely need to migrate the data.
You need to consider not to migrate the data if possible. If you can, then again, don't use MS Cubes.
You need to consider a strategy for the GUI OLAP part.

I hope the above helps.

What are the three main technologies for a redesign of a COBOL legacy system

2 Answers2