3

I am working at a medium-size semi-govermental organization managing subcontractors for software projects. One of our contractors recently turned in the "source code" for the project they had contracted. I strongly suspect that the code is auto-generated. This bothers me for several reasons (some of which are also mentioned in: Is source code generation an anti-pattern?). E.g.:

  • I suspect that the code is a lot larger than it could have been, had it been written manually
  • maintaining this code will accordingly be a lot harder
  • since this is not really source code, there are no comments at all (or only token, useless comments) and no effort was apparently made to come up with a meaningful organization of the code base (e.g. in terms of libraries, etc) that would have made sense to a human maintainer
  • the contractor is using this as a way to circumvent their obligation to surrender their source code to us and also with a view to securing future maintenance contracts as well (or at least enjoy an advantage over other bidders who won't have access to the real sources).

The contractor has also done a clever job of injecting some artificial randomness in the generated sources so as to give the impression that this was written by hand.

I feel that my employer / the taxpayer is being cheated by an unscrupulous subcontractor willing to walk on a fuzzy red line, betting that they've done something clever that can't be conclusively proved.

Is there a way I can detect and prove that this was automatically generated by some other software?

Thomas Owens
  • 79,623
  • 18
  • 192
  • 283
humblemike
  • 43
  • 3
  • 2
    I think you are talking about **obfuscated code**, not (any) auto-generated code. I mean, sure, it is spit out by software. However, it appears they had done the work correctly and then have a software mangle it up. Some obfuscation tools have telltales that may indentify them. Yet, that would be an arms race, you become clever detecting it, and they become more clever making it. Code readability is hard to measure, yet, you might want to look into that. – Theraot Feb 03 '20 at 14:49
  • @Theraot; no the code was spit out by a tool, there is no obfuscation. Some randomness was apparently added in aspects like indentation etc. to make it look less regular. – humblemike Feb 03 '20 at 15:00
  • what do you think nin the input to the tool was? – Ewan Feb 03 '20 at 15:01
  • @Ewan some kind of in-house developed DSL – humblemike Feb 03 '20 at 15:03
  • 3
    I removed the reference to preventing this. Language for contracts need to come from experts in law, especially around contracts and isn't well suited for software engineers. However, detection of some code seems to be a good question since it does impact the maintainability of the product going forward. – Thomas Owens Feb 03 '20 at 15:10
  • presumably if the meta language is in house, the code would be useless to you? – Ewan Feb 03 '20 at 15:14
  • @ThomasOwens I am not asking for contract language but I'd welcome some answers on what kind of software requirements or software development process requirements would make a code-generation approach less feasible – humblemike Feb 03 '20 at 15:14
  • @Ewan yes; that's exactly why I am trying to prevent such practices in the future. I asked for the source code and I am given code that was spit out by an in-house tool, or (if the contractor comes clean) code in some in-house DSL nobody would be willing to work with or able to evolve. – humblemike Feb 03 '20 at 15:17
  • I see there is no way out, unless your contract explicitly states that all "source" code is to be turned in, including the one from which other generated code is created. You should contact your legal and ask them this question. – Euphoric Feb 03 '20 at 15:42
  • I don't think you are asking about "autogenerated code"; you are asking about poor-quality code. Your software supply contract should have measurements of the quality of the code. – BobDalgleish Feb 03 '20 at 18:12
  • I suspect that what you got was simply bad code written by an incompetent developer who's only tool was copy-paste. – 17 of 26 Feb 03 '20 at 18:45
  • Would not the source code be the code that was written by a human, not the intermediate code generated by a tool? – Martin K Feb 03 '20 at 21:19

4 Answers4

4

Our team recently had something similar and our salvation was the contract that our enterprise was with the contractor. Usually in that contracts have a clause that is about the quality of the code with tools like SonarQube and it was ours salvation. The contract had a clause about the code quality and all the boilerplate generated produced to not pass the quality umbral. After that, the contractor needed to send the original code to us.

Bart van Ingen Schenau
  • 71,712
  • 20
  • 110
  • 179
Balbu
  • 59
  • 3
4

Is there a way I can detect and prove that this was automatically generated by some other software?

I'm assuming the code is pretty repetitive. This isn't a given for generated code, but it matches the way you describe it.

Let's say there are two possible ways of generating this sort of code:

  1. Write (or license or whatever) an in-house codegen tool, and the input to it. Slightly randomize and then deliver the generated code.
  2. Get a competent developer to write one or two initial cases, and then have a room full of interns implement sections of a spec, piecewise, by copying and pasting those starting cases.

I can't think of a reliable way to distinguish between the two just from the code, unless you can ask to audit the SCM history. I wouldn't really even want to speculate on which is more plausible, or cheaper for the contractor: some problems genuinely lend themselves to lazy copy-pasting.

I said just from the code above, because the alternative is to either find the codegen tool they licensed, or find a whistle-blower to tell you about it.

In any case - ie, even if you do somehow prove the code is generated - your business case is with its quality and not with how that level of quality was achieved.


PS. code generation is definitely not an antipattern, but delivering (or even committing) generated code instead of the actual source definitely is.

Useless
  • 12,380
  • 2
  • 34
  • 46
3

I hope your company was clever enough to make a contract which does not only include the source codes, but also some terms about getting the code explained in case you want to change it by yourself in the future.

That might give you an opportunity to ask your contractor for an audit, where someone of yours does some pair programming for a few hours or a day with them. Such a session should be pretty enlightning.

It could be also a good idea to have a clause in your contracts (at least, for future contracts) similar to the one in the GPL which describes what kind of source code one has to publish under that license. It defines it as the preferred form of the work for making modifications to it (see here).

So if the source given to you does not effectively enable you to maintain the system later, this could be seen as a contract violation.

Doc Brown
  • 199,015
  • 33
  • 367
  • 565
2

There are legitimate cases for code generation, but if you are getting the equivalent of word docs saved as HTML what you are really looking for is a code quality measurement of some kind.

In my experience there really is no ways of automating this. Anything you put in (such as the suggested SonarQube) can be worked around with automation simply filling in the criteria to the lowest possible level eg.

///
// this method Retrieve Foos
/// 
public List<Foo> RetrieveFoos()

You should work with the contractor, make sure you own the source control and are involved in code reviews from stage one. But this really begs the question of whether you want to farm this out or keep it in house at all.

You could hire contract programmers to work under you rather than sending the project out, presumably the external company feels it can achieve more faster using its code generation and this is reflected in the price.

Ewan
  • 70,664
  • 5
  • 76
  • 161