The case for code obfuscation?

Question

What are the top reasons to write obfuscated code, in terms of a real benefit to the people developing the code, and the business that runs that code (if the code in question is in fact commericial code)? Are there documented cases (available online in some location) which describe when obfuscation did more good than bad? Are there well-known examples where, for example, obfuscation was proven to meaningfully delay a malicious 3rd party from getting at the code? It seems that, just like rolling up your car windows won't stop people from breaking them and stealing your stereo, obfuscating your code just keeps honest people honest.

=========

Background:

This is an attempt to purposely challenge my assumptions on this topic.

I'm big-time against using code obfuscation in general, but I'm curious if I'm missing something. I get why, in cases like JavaScript, minification helps things load faster and all (there's a real, functional benefit there), but I can't seem to come up with a single reason why code obfuscation, for the purpose of being an obstacle to discovering what an section of code/algorithm does, is actually effective for any purpose whatsoever.

With open source being crazy popular, the question seems to be "share the code, or keep it proprietary?" When it comes to commercial code, I can understand why you can't share everything, and you've got the law in your side to fight theft.

BTW, if the reason someone is writing obfuscated code is "job security" then I would fire any programmer found to be consistently, and purposely using obfuscation with the sole purpose of helping to keep their jobs, unless they could reasonably show that it had some business benefit. It's so completely anti-team that it's ridiculous, and points to someone that's more concerned with keeping their job through misguided practices, then keeping it because they write awesome software.

I only mention this specific case because, while I realize people are usually joking, I'd like to deter any answers whose basic thrust is that obfuscation for job security alone is a good idea.

See also: http://programmers.stackexchange.com/questions/17995/what-are-the-advantages-of-obfuscating-release-code — Dan McGrath, Jan 10 '12 at 04:25
Simply put, obfuscation [changes the economics](http://stackoverflow.com/a/6018904/42473) of reverse engineering your code, nothing more. — Mark Booth, Jan 10 '12 at 11:35
Thanks, everyone. I've certainly seen a different perspective on this, thanks to your detailed answers and comments. There are several high-quality answers that talk about various angles of this issue. Rather than award a single question, I've up-voted my favorites. — jefflunt, Jan 10 '12 at 19:30
Are you considering or focused on **source code** or **object / executable** code? For example, Gimpel software distributes a version of their lint tool in obfuscated C source code, such that the, typically Unix, clients can compile it to run in whatever environment they wish, without the Gimpel needing to support / maintain N number of target environments, including oddball or legacy environments. This is reasonable different from object/executable obfuscation used for copy or data protection (e.g. illicit copying) as a layer of security to delay / deter reverse engineering. — mctylr, Jan 10 '12 at 20:49

score 53 · Accepted Answer · edited Apr 12 '17 at 07:31

One very interesting use case for obfuscation is tracing the origin of illicit copies. Assuming that obfuscation is a relatively cheap operation the original author can supply each client with differently obfuscated versions of the application, if an illicit copy is found the author can compare with supplied versions and trace back the source of the piracy.

That's a form of steganography, inspired and in variation of the "traitor tracing" cryptographic schemes. I have no idea if it's common¹, or even if it's a good idea, but I've seen it applied in practice under the following parameters:

Highly competitive nationwide market with just two vendors,
About 50 deployments covered the market,
Average development time for both applications was a couple of years (more or less),
Average obfuscation time for our application was a couple of hours,
Lifespan for both applications was expected to be about ten years.

The rationale was of course security through obscurity initially, and it evolved at the aforementioned scheme at some point². Both vendors had access to each other's binary code, legally, and I think it's obvious that decompilation attempts from both were expected. Obfuscation did nothing in terms of security, in the long run. Both vendors had highly motivated and talented teams, working in an extremely profitable and niche market, in the end our products were more similar than not, and any competitive advantage was gained through other, less obscure means.

I can't really expand, because (a) it was very early in my career and I didn't get a clear overview of the design decisions or the results of the tracing scheme (if any) and (b) some of my involvement with the project was under a NDA.

Another valid use case for obfuscation could be when you are somehow legally obliged to submit your code to a third party:

If your firm does IP work for technology companies, or is involved in cases involving software source code, you may be obliged to submit your client’s source code to the USPTO, a court or third party.

Since source code is considered a trade secret, most regulatory agencies use a "50%" rule. Source code submitted is obscured so that it cannot be used as-is.

IANAL, and the link is more relevant to hard copies of code rather than actual working code, so this might be completely irrelevant.

Now, as Javascript is the canonical example for obfuscation, there's one side-effect that's not commonly considered, and that's hiding malicious code in obfuscated Javascript. Although there are definite advantages in minifying³ Javascript, I don't see any point in actual obfuscation and I'm happy Douglas Crockford agrees with me:

Then finally, there is that question of code privacy. This is a lost cause. There is no transformation that will keep a determined hacker from understanding your program. This turns out to be true for all programs in all languages, it is just more obviously true with JavaScript because it is delivered in source form. The privacy benefit provided by obfuscation is an illusion. If you don’t want people to see your programs, unplug your server.

As for obfuscation for "job security", that's a behaviour that should never pass code review, and if identified it shouldn't be tolerated. I wouldn't go as far as firing the culprit at first, but repeat offenders definitely deserve a good spanking, at least.

In conclusion, obfuscation is a typical example of security through obscurity, it's only obvious merit is as a deterrent and nothing more. There might be creative use cases⁴ I don't know of, but in general the benefits are minimal, at best.

_{¹ After writing this I found out this answer which basically describes the same scheme, so it might be more common that I thought.

² Although steganography is still security through obscurity.

³ Minification ~ removing whitespace and shortening tokens, not intentionally obscuring.

⁴ Does the International Obfuscated C Code Contest count?}

"If you don’t want people to see your programs, unplug your server." - or use Software Guard Extensions and trust Intel. — user253751, Feb 03 '16 at 22:32

score 41 · Answer 2 · edited Jun 16 '20 at 10:01

41

The case for code obfuscation is that it raises the bar for a 3rd party to determine what/how the code is working.

However, that does NOT mean that a developer should ever be writing obfuscated code.

See, this is the bit I think is missing from your question: Code obfuscation (just like JavaScript minification) does not need to - and should not - be done manually by the developer. Likewise, this should not be stored as your core source files in version control either.

Code obfuscation should happen as a post processing step during compilation into your production build. There are plenty of 3rd party products to do this as well, so there is almost no reason to do this in house.

For example: Dotfuscator

The IEEE has a paper on the effectiveness of code obfuscation

Results show that identiﬁer renaming signiﬁcantly decreases the efﬁciency of attacks, at least doubling the time needed to complete a successful attack (even in the worstcase scenario, i.e., against the best attacker). In addition, obfuscation reduces the gap between novice and skilled attackers, making the latter less efﬁcient, and makes systems that are easier to attack in clear more similar to those that are intrinsically harder to break.

Emphasis mine.

edited Jun 16 '20 at 10:01

Community

1

answered Jan 10 '12 at 04:23

Dan McGrath

11,163
6
55
81

2

I'd give this +1, but the link requires a paid subscription which not all readers will have access to. – mattnz Jan 10 '12 at 04:33
Yes, that is the unfortunate fact of the IEEE that I'm not entirely happy with, but that's another topic – Dan McGrath Jan 10 '12 at 04:35
9

There's a publicly accessible [pdf version here](http://selab.fbk.eu/ceccato/papers/2009/icpc2009.pdf). I think it's ok to use that instead, it's on the homepage of one of the authors of the paper, Mariano Ceccato. – yannis Jan 10 '12 at 04:52
Great find. I had searched for it with Google Scholar, but didn't find it. I've updated the link. – Dan McGrath Jan 10 '12 at 04:53
1

+1 for "Code obfuscation (just like JavaScript minification) does not - and should not - be done manually by the developer" – João Portela Jan 10 '12 at 15:24
No problem @normalocity – Dan McGrath Jan 10 '12 at 16:29
I'm curious as to what the downvote was for? – Dan McGrath Jan 10 '12 at 18:27
@DanMcGrath: Have another +1 to compensate for whoever downvoted. :-) – Peter K. Jan 10 '12 at 19:54

Mike Nakis · Answer 3 · 2012-01-10T10:06:10.133

I have participated in the development of an MMORPG. This involved server logic and client logic. Throughout the many-year-long development of the project, whenever we considered the interface between the client and the server, the rule was that the client ought to be treated by the server at all times under the presumption that it has been hacked. In other words, the server had to be written in such a way that there was no response that could come from the client that would cause the server to fail, or allow the client to cheat. Still, it was known from the beginning that hackers would inevitably find holes in the system and exploit them in order to cheat. And after a while they did.

Of course, before shipping the client to the great big world out there, we made sure to obfuscate it. We believe obfuscation had the following effects:

It deterred the non-expert hackers from even trying.
It delayed expert hackers in achieving any hacks.
It reduced the number of hacks achieved by expert hackers.
It limited the effectiveness of the hacks.
Most importantly: it caused the hackers to perform more test runs with their hacked clients against our servers before achieving a working hack, which increased the chances of us discovering them by looking for irregular activity in the server logs.

Game accounts of discovered hackers were terminated without a refund, so this made the hacking business costlier and less attractive.

So, due to all of the above, I believe obfuscation had an overall positive effect in our game, and by extension, obfuscation can have an overall positive effect in any piece of software which is liable to get hacked. (For example, software containing copy protection measures.)

The effects that obfuscation had on maintenance were close to none. There were a few places where some inexperienced programmers were making assumptions about the names of identifiers, (they were using reflection,) but once those were sorted out everything was fine. The obfuscation step just became part of the overall build step for the production version of the game, so most of us developers never had to worry about it or have anything to do with it. We already had a tool to view the logs of the game, so we modified the tool to use the association table (mapping obfuscated identifiers to proper identifiers) produced by the obfuscator in order to translate the logs for us on the fly, so we never even had to see any obfuscated identifiers while doing post-mortem examinations based on logs collected from the field.

@deworde I updated my answer with one more paragraph about the effects of obfuscation on maintenance. — Mike Nakis, Jan 10 '12 at 10:06
@Carson63000 Yes. (And LOL at your avatar --is that chainmail and are you wielding a sword?) — Mike Nakis, Jan 11 '12 at 01:14
@MikeNakis: nice! And yep on the avatar - well, it's knitted chainmail and a wooden sword, the company I worked for was making some assets for banner ads and got staff to dress up rather than hiring models. :-) — Carson63000, Jan 11 '12 at 04:44

score 3 · Answer 4 · answered Jan 10 '12 at 16:11

3

Reading and understanding (and obviously writing) obfuscated code can be an interesting mental challenge. It probably falls outside the scope of what you were asking, but examples like IOCCC may be a source of amusement as well as horror.

answered Jan 10 '12 at 16:11

Vatine

4,251
21
20

4

This really should have been a comment on the question, not an answer. – Dan McGrath Jan 12 '12 at 19:13

The case for code obfuscation?

4 Answers4

Linked