92

In our company we have a small program (.exe 500Kb size) that does mathematical calculation and in the end it spits out the result on a Excel spreadsheet that we use to continue our workflow.

I want to modify the columns, spacing format and add VBA logic etc. on the Excel spreadsheet but since this parameters are not configurable in that program, it seems to me the only way to modify it is to break down/reverse engineer the .exe

Nobody knows in what language it was programmed in, the only thing we know is:

  1. Developed 20+ years ago
  2. Developer retired 10 years ago
  3. GUI Application
  4. Runs standalone
  5. Size 500Kb

Any suggestions what options I have to deal with such kind of problems? Is reverse engineering the only option, or is there a better approach?

Juan Carlos Coto
  • 426
  • 3
  • 10
Alec
  • 997
  • 1
  • 6
  • 6
  • 150
    Do you know what the calculation is that it performs? If so, write a new app, push some test data through both to check the new one works the same, then throw away the old one. Then make the changes you want to make. – David Arno May 27 '16 at 14:08
  • 14
    @DavidArno 's comment would make a good answer. Reverse engineering is possible, but re-spec'ing and rewriting the app will be a lot cheaper/easier/quicker. – Dan Pichelman May 27 '16 at 14:16
  • 44
    The other way to modify it would be to take the result the original program produces and filter it into whatever you want. – Blrfl May 27 '16 at 14:24
  • I know what my Input and output Data, but the equation in between and are very, very complex according to one of our engineers. I'll look into adding a second app that grabs the "end" data and process it into a "new" Excel spreadsheet. Thank you for all the tips, I appreciate it! – Alec May 27 '16 at 14:40
  • 1
    Do you know what the original program was written in? It may be possible to decompile it. – RubberDuck May 27 '16 at 14:43
  • @ RubberDuck Unfortunately no, we have no idea in what it was written in. Thanks – Alec May 27 '16 at 14:44
  • 9
    @Alec if you open the .exe with a hex editor, you may get clues about what it was written in. For example, the compiler name might be embedded. From there you'll know more about possible decompiling options. – GrandmasterB May 27 '16 at 17:19
  • 26
    Alternatively, you could attempt to find the gentleman who wrote the application and see if he's willing to come in for a day or two (maybe a couple of hours each day) as a consultant. If he's a retired developer, there's a moderate chance that he might appreciate a little spending money at the rate of $100-150/hr while actually enjoying the moment of doing a bit of work for just a brief period if time. – RLH May 27 '16 at 20:10
  • 3
    Regarding decompilation, open the file in Notepad (or maybe even Notepad++) and simply browse through the content. No, you won't be able to discern a single thing about how the application works by such a visual inspection, however, you'll be surprised at just how much clear-text data segments there are in most binary applications and file types. There is a moderate chance that you may find a little information about the compiler or libraries/frameworks that this was built on run on. – RLH May 27 '16 at 20:13
  • 2
    This problem is what the GPL licence solves for you, it comes with the right that the developer must give the source when you get the binaries – Ferrybig May 27 '16 at 21:16
  • @RLH, you think you could fix your own stuff after 10 years without source code? – Paul Draper May 27 '16 at 23:28
  • @PaulDraper no, but I could recall what tools I wrote it in, spec out algorithms, and other details. If you want to do a low-level reverse engineering of the application, finding out what the dev built it in could sure help. – RLH May 28 '16 at 00:14
  • 5
    If this is truly "mission critical" but you don't actually know what it does except that it's "something involving very, very complex equations", I think you first priority should be writing legal disclaimers into all your customer contracts to prevent them suing you. Their lawyers would just *love* to get their teeth into that scenario. – alephzero May 28 '16 at 02:01
  • 3
    You can't maintain a program without source code. You can rewrite it, or wrap around it. The former is better for the long term, the latter is more feasible in the short term. If the software is mission critical, and the business will still depend on it for the foreseeable future, you should try to rewrite it now, and now take steps to protect the business from such situations in the future by making sure that the business keeps track of the source code to all irreplaceable, mission critical software. – Lie Ryan May 28 '16 at 07:41
  • 8
    Replace it now while you can still run it!, it's possilble that after a few more versions of windows it will no-longer be possible to execute this code – Jasen May 28 '16 at 07:48
  • 3
    @Ferrybig I support the GPL license but I don't believe it solves the OP's problem. It would not surprise me if the OP's company actually has the source code - somewhere. Legally, the company always owned the source code. It just misplaced it. GPL does not help you find things you lost. – emory May 29 '16 at 00:43
  • @Ferrybig I always thought GPL only forces to distribute the sources if they are requested. So you can give somebody a GPLed program as binary and if he wants the source he has the right to request and obtain them. That's why when you install a binary package (in eg dune Linux distro) you generally don't get the sources but you have to explicitly say that you want them. – Bakuriu May 29 '16 at 09:17
  • 1
    @Ferrybig Presumably the developer was paid, in which case it was a work-for-hire, in which case he has to give you the source code and all other work products anyway. – user207421 May 31 '16 at 01:12
  • As I know there is only two solutions that can decompile a program. The first one proprietary, works well is Hex rays decompiler plugin for IDA. It generates c code. The second is free opensource project boomerang. It not that good but you can modify it by your needs. – Sergei Krivonos May 31 '16 at 06:54
  • But if it .net then things are much esier. – Sergei Krivonos May 31 '16 at 06:55
  • I'd wrote my answer but looks like it locked. I have 101 reputation and still cannot answer. I had no answers on this site thus. – Sergei Krivonos May 31 '16 at 06:57
  • 1
    You have a mission critical application which nobody knows how works and/or can fix for 10 years and even now you do not consider that a problem? – Thorbjørn Ravn Andersen May 31 '16 at 09:16
  • 1
    @RLH Fully agree with this. Opening in Notepad you can tell if something is written for .NET just by looking for commonly used namespaces and types etc. – user9993 Jun 01 '16 at 10:59

8 Answers8

233

Reverse engineering can become very hard, even more if you do not just want to understand the program's logic, but change and recompile it. So first thing I would try is to look for a different solution.

I want to modify the columns, spacing format and add VBA logic etc. on the Excel spreadsheet

If that is the only thing you want, and the calculation done by the program is fine, why not write a program in the language of your choice (maybe an Excel macro) which calls your legacy "exe", takes the output and processes it further.

Doc Brown
  • 199,015
  • 33
  • 367
  • 565
  • 9
    Why does the new program have to call the old EXE? Why not just make the new program independent and then write a script that calls both and coordinates the output and input? My experience suggests that letting command line languages like bash, PowerShell, or command prompt handle process coordinate is generally simpler than trying to code it yourself in an imperative language. Otherwise, +1. – jpmc26 May 27 '16 at 23:57
  • 8
    @jpmc26: That's true right up until you have to deal with Bash's absurd quoting rules. Yes, they are (mostly) POSIX-compliant. No, they do not make any goddamned sense. $FOO should not word split, for example. – Kevin May 28 '16 at 08:30
  • 1
    @Kevin I'd rather deal with occasional award quoting rules than with the mess I've seen trying to make an imperative language sit and wait for a program it launched to complete and then trying to use its output (*especially* if it goes over stdout or stderr). Shell languages were *designed* to solve the latter problems, and they do so fairly elegantly. I've always found this more difficult and that it requires much more knowledge about the system in imperative languages, but maybe it's just me. – jpmc26 May 28 '16 at 08:37
  • 16
    @jpmc26: I've never had any trouble calling [`subprocess.run()`](https://docs.python.org/3/library/subprocess.html#subprocess.run), personally. – Kevin May 28 '16 at 08:39
  • 1
    @Kevin You must have a much deeper understanding of how all the piping and the `shell` parameter works than I do, then, which goes back to my final point. – jpmc26 May 28 '16 at 08:42
  • 3
    @jpmc26: What piping? It's pure cookbook; if you want stdout, you pass the magic `PIPE` constant. Otherwise, you don't and it gets discarded. What's there to understand? – Kevin May 28 '16 at 08:43
  • 1
    @jpmc26: thxs for your comment. If one uses a separate script, or implements the calling of the old program directly in the program which also does the output processing is IMHO a minor implementation detail. Having both in one has the advantage one does not need to maintain separate things in different programming languages, and it may be less effort to make this an fully integrated solution which seems to work "as one program" from the user's perspective. However, without knowing "the real thing" and without knowing the exact requirements it is hard to say which approach is really "better". – Doc Brown May 30 '16 at 08:33
  • 3
    ... I should add that I did use Excel with VBA in the past as a frontend to command line utilities very successfully more than once. The structure is always the same: a sheet for entering the parameters as a "poor man's UI", a "Start" button on that sheet. In the VBA code, one needs a `Shell` call in Excel VBA like this one: http://stackoverflow.com/questions/8902022/wait-for-shell-to-finish-then-format-cells, one can pipe the stdout/stderr from the cmd utility into separate files and then apply the output formatting. – Doc Brown May 30 '16 at 08:50
  • If the output is more than trivial columns of data, I'd suggest using a lex/flex script to scan the output and convert it to the form you want. – jamesqf May 30 '16 at 17:39
  • 1
    While generally correct by content, this answer is really badly written, with the only meaningful part of it making the half of the last sentence. – h22 May 31 '16 at 09:58
  • @h22: I am open for any kind of *constructive* critics. Or better: get enough rep to edit my answer by yourself if you think you can improve it. – Doc Brown May 31 '16 at 11:46
  • 1
    This is a very good advice. Instead of fiddling with things one does not know the ins and outs of (which would probably increase the rate of human error), make tools doing pre- or post-processing instead. – mathreadler May 31 '16 at 20:56
  • Perl was born to do that. Good excuse to try the fresh Perl 6. – JDługosz Jun 01 '16 at 09:23
114

In addition to the already given answers by Doc Brown and Telastyn, I would like to suggest an alternative approach (under the assumption it's mission critical).

If you do not know the computations it performs and the calculations are (somewhat) mission-critical: Deduce the original logic in the .exe file by any means necessary. Decode it using a decompiler/disassembler like IDA if necessary. Hire a consultant (or a batch of consultants) if necessary.

Sure, work around it for now using their solution, but do not let it be.

The reason I suggest is as follows: You have admitted that the calculations are very complex (according to an engineer you spoke to). It's also mission-critical. So if somehow the original .exe stops working due to changes in the platforms you have (maybe 16-bit support gets dropped?), you have just lost a mission-critical piece of knowledge.

Now, I'm not concerned about losing the .exe, but about losing the knowledge it encodes. That knowledge must be recovered.

As before: if that knowledge is already available, make sure to write it down in a format that it's not going to be lost anytime soon. Otherwise, recover it and write it down.

Sjoerd Job Postmus
  • 1,814
  • 1
  • 10
  • 12
  • 14
    Modern decompilers actually produce code that's usually quite legible, especially if the original source was in plain C or assembler, and not a higher level language. – phyrfox May 27 '16 at 16:33
  • 4
    Very good point. Also: Just patching it up so that it works again will only work until the next fix needs to be implemented. – Daniel Jour May 27 '16 at 16:56
  • 33
    @phyrfox 20 years old... developer retired 10 years ago... only output is an Excel spreadsheet... I'd put money on it being a VB6 application. – J... May 28 '16 at 01:38
  • Why worry about it now? If it still works, wouldn't make it sense to worry about it once it doesn't work anymore? Company may no longer exist when that happens, decompilation tools may improve over time, etc. – mucaho May 28 '16 at 11:09
  • 10
    @micaho: or the company still exists and the person with the know-how to verify the results and hidden assumptions has just been hit by a truck. Of course, it's a business risk so ultimately the stakeholders should decide. I just wanted to emphasise that the "wrapper" will work now, but only adds to the technical debt. – Sjoerd Job Postmus May 28 '16 at 11:56
  • 22
    @J...: If it is VB6 then the original poster is in luck. You can recover the source code from a VB6 compilation pretty easily. – Eric Lippert May 28 '16 at 12:44
  • @EricLippert Indeed you can. I guess I meant to say that between the lines ;) – J... May 29 '16 at 01:14
  • 1
    Doc's answer is the best short-term answer, but this is the best long-term answer and is something that should happen before it's really needed. – Ellesedil May 29 '16 at 17:10
  • 1
    "Don't know the calculations" "mission critical" - all that comes to mind is "obviously a major malfunction". – corsiKa May 30 '16 at 23:00
  • 3
    The danger of the original exe no longer working is not that large. Emulators can still keep running it if the architecture changes. Still, it would be infconvenient to always have a virtual PC running Windows XP or 98 or whatever it can run on. – vsz May 31 '16 at 06:20
  • @vsz, ondeed, keeping the program running is no big deal. I don't think that's the problem this answer is focused on. The problem that Sjoerd Job Postmus and many other commenters have pointed out is that the OP's company seems to rely on this executable to do an important and "very, very complex" task, and *nobody has any clue how it works.* [*cont.*] – Vectornaut Jun 01 '16 at 08:28
  • @vsz [*cont.*] If it's not perfect—if it subtly misuses a statistical test, or it manages transmitter power in a way that can violate new FCC rules, or it estimates each patient's effective dose using a formula which is now known to be flawed, or it uses tensile strength values which are an order of magnitude too high when an unfortunate combination of flags is set—nobody will ever stumble across the problem while looking through the source code. I shudder to think about other ways the problem might be detected. – Vectornaut Jun 01 '16 at 08:28
  • 1
    @vsz It becomes harder when every time the emulator starts it tries to upgrade itself to Windows 10. – JDługosz Jun 01 '16 at 09:26
74

Ask the original programmer, if possible.

A few weeks ago i've been contacted by a firm I used to work for 10 years ago with the very same question about an mdb file developed mid 90s.

Paolo
  • 847
  • 5
  • 8
  • 52
    This is the real low hanging fruit. Everyone (including myself) romanticizes the use of hard programming skills like reverse engineering, reimplementing the program's functionality or adding layers to the data processing. In reality, the best place to start is a friendly email which might come back in an hour with the location of the source code or some other ideal solution. – user1717828 May 27 '16 at 18:36
  • 2
    When at home with a 10 years old application me too I fire up a disassembler but during work hours the goal is different ^^ – Paolo May 27 '16 at 19:40
  • 2
    Did you remember anything about it? :) – Ángel May 27 '16 at 23:31
  • 2
    of course! unfortunately the company undergo 3 acquisition & merge so lots of information got lost and part of the backups was in the lost bag... the development was on site on their machines so I have no copy of the source and that's it. – Paolo May 28 '16 at 12:19
  • 1
    Scan the EXE for embedded strings that might include a developer's name or something. That's easier than a full dis-assembly! – JDługosz Jun 01 '16 at 09:28
  • We also had a problem with an old program, however we weren't able to contact the programmers. It's worth a try. – nalply Jun 02 '16 at 20:47
55

Any suggestions what options I have to deal with such kind of problems?

If all you're looking to do is modify the output, then why not simply use composition?

Instead of modifying the black box you can't easily access, you create a new program that takes the Excel output, and does your formatting/column changes too. Then you could make a new exe/script that calls the two programs in order, so it appears to the end user that there is just one program that does all of the work - even though it's two distinct steps under the hood.

Telastyn
  • 108,850
  • 29
  • 239
  • 365
  • I'm new to programming but willing to learn the new area. Is JAVA a good option to achieve the result? So far I only did some VBA coding. Thanks! – Alec May 27 '16 at 14:53
  • 2
    @Alec Whether java is a suitable language or not mainly depends on the amount of data you need to handle / the amount of computation that you need to do. If both are low, java is fine. If either one is critical, you better drop down to C or C++. But since you seem only to be using an amount of data that fits into an Excel spreadsheet anyway, I don't think there's enough data involved to make java a bad choice (Excel would likely explode before your app does). – cmaster - reinstate monica May 27 '16 at 15:26
  • 18
    @cmaster the idea that Java is prohibitive for heavy computation is an outdated notion. The *worst* [benchmark listed here](https://benchmarksgame.alioth.debian.org/u64q/java.html) isn't even 4x (most are 2x or less) and if a single digit scalar is your breaking point, the savings in safety (which translate directly to developer dollars) is more than likely going to offset the performance hit. – corsiKa May 27 '16 at 17:47
  • 8
    @Alec any language will work. VBA seems a good choice because it already integrates with Excel so well. – Captain Man May 27 '16 at 18:18
  • 4
    @corsiKa That depends entirely on the scale of your application. If a single run consumes several tens of thousands CPU-hours, a factor of 2 or 4 becomes prohibitive: It translates directly into the amount of results that you can get out of a multi-million machine. Also, such applications typically work in lockstep, so garbage collection is pure poison for their performance, the small interruptions would multiply by the number of processes. I tell you, such applications exist, and they are most certainly not written in Java. They are just not used by the average internet business. – cmaster - reinstate monica May 27 '16 at 18:23
  • 7
    @cmaster We're talking about some simple calculations , not a full blown AAA game engine with realtime global illumination, physically based rendering, animated sparce voxel octrees, universal physics field simulation and the like. No offense, but inserting any argument RE performance here is bad. Ease of use should be #1, and as someone who's been using C++ for a few years it's the last language I would recommend in this case. –  May 27 '16 at 19:23
  • 2
    @TechnikEmpire Cheers, you are absolutely right. And that was my original point: The way I read the question, there is nothing wrong with using Java to solve it. My last comment was only in response to another comment which basically said: *Ignore performance, Java is good for everything*. And that is simply not true, as both my and your examples show :-) – cmaster - reinstate monica May 27 '16 at 20:15
  • If what you're modifying is a xls file (vs just a csv loaded into Excel); the key question is probably how good of an Excel comparability library Java has. This's something I really don't know the answer to. I've done that sort of thing before in C#; where the library is a wrapper around MS's COM library. For reading a file and writing to the cells/formatting the cells it works reasonably well aside from a bit of klunkyness over cell values being `object`. I've never tried touching VBA through it. As an edge feature I'd want to verify that very early in testing a lib, esp if pure 3rd party. – Dan Is Fiddling By Firelight May 27 '16 at 21:20
  • 1
    @Alec - you've done VBA coding? Excellent! Write an Excel macro in a template Excel file that reads/imports/merges (whatever the right terminology is) the target file. Run it [from the command line](http://stackoverflow.com/questions/2050505/way-to-run-excel-macros-from-command-line-or-batch-file). – davidbak May 31 '16 at 17:45
  • davidbak has the best solution. Write a VBA macro or set of macros. You don't really want to interop with excel because that tends to execute really slowly compared to executing a macro from within excel. Another option is to simply export the excel file as a .csv file and you can parse the file to your liking, modify the data, save into another .csv file and re-import the .csv file back into Excel. However, you'll still need to do something to get the formatting to your liking which is why it seems simpler to use a VBA macro from the get go. – Dunk May 31 '16 at 18:02
4

Write a simple wrapper around the program, capturing its output. It is not complex to do as many languages (Java, C++, Python, .NET, for instance) have means for this. Parse the output and generate another, in the desired form. The user will call your new program. The old executable will stay next to it, or even can be automatically extracted from resource, before invoking it.

This solution of course works well enough only when output is well structured so easy to parse.

That it is a GUI application, is not a blocking problem. You can launch it, generate output, and then automatically post process it when this GUI terminates.

h22
  • 905
  • 1
  • 5
  • 15
  • 3
    How is this different from Doc Brown's top-voted answer? – Laf May 30 '16 at 20:31
  • I disagree with the assumption of Doc's answer being badly written. It's clear and succinct. – Mast May 31 '16 at 08:32
  • 2
    If you would look into the text of this answer, you will see that the only informative part makes exactly the end of the last sentence "which calls your legacy "exe", takes the output and processes it further." – h22 May 31 '16 at 09:54
  • 2
    Not a downvoter, and don't see why this got -3... is Meta at it again? but separately, I would advise against lambasting someone else's answer for "contains lots of brain-diluting blah" when (A) that's a subjective judgement and (B) in my subjective opinion, yours contains just that! – underscore_d May 31 '16 at 12:06
  • This can also be rewritten as "contains uninformative generic talks that just distract from the topic wasting the readers time", if that way looks more helpful. Provides a hint to the right approach on the second half of the last sentence. This had no intention to be insulting. Comment removed. – h22 May 31 '16 at 12:23
  • @underscore_d No meta effect, question went HNQ. – Mast May 31 '16 at 14:35
  • @underscore_d "Is Meta at it again?" sorry I don't get the reference – async Jun 02 '16 at 08:48
  • @async You can probably find a horrifying level of detail by searching _The Meta Effect_, but here's a summary from my (layman's) POV: in a disturbing proportion of cases where someone makes a thread on Meta pointing out some perceived injustice in a non-Meta thread, tens and tens of users blindly pile into that thread and +1 the reporter and -1 whatever the reporter was complaining about... often seemingly without applying critical thought or considering the impact of their action. Since I don't think this answer is specifically bad, I wondered if that was why it was at -3. – underscore_d Jun 02 '16 at 14:38
3

There are companies that specialise in exactly this kind of problem. They use proprietary code to decompile native code into a high level language, then apply human expertise to make it useful (e.g. giving variables appropriate names).

Some years ago my employer used this to migrate some native S/390 mainframe code onto Linux servers. We gave them a binary, they gave us source code in C.

Whether this is necessary in your case, is up to you. If you only care about the format of the output, you can simply massage the output after it's been produced. However as others have pointed out, having business logic hidden in a binary blob could be an ongoing risk.

slim
  • 799
  • 1
  • 6
  • 11
2

Write some tests that exercise as many cases as possible on the old code. Find corner cases, test wrong input, and test correct input.

Pin down what is correct output given various cases, and then try to write an implementation that satisfies the same tests.

I wouldn't go down the reverse engineering route. It's incredibly complicated to reverse machine code, and you should already know what the purpose of the exe is. Reverse engineering is a little too much work for what you're after.

If the software was developed by one guy 20 years ago, it's probably not something that takes a lot of modern power. A GUI program that stretched the machine 20 years ago will barely register on a modern machine, so you're probably looking at something that's relatively simple to reproduce.

Carlos
  • 874
  • 7
  • 13
0

Try to reverse engineer the exe. Only for the purpose of finding the computation logic or at-least to get a fair hint of what it actually does and if your reverse engineering can get you to that point, you can write new application based on that computation logic. Apart from that, I don't see other wayout.

Easier said than done, reverse engineer an exe created 20 years back is real challenge.