5

For a scientific simulation I need to write some computations in C++. Since this became extremely tedious, I built myself a small code generator: In a scripting language (Python) you put together a syntax tree consisting of assignments and mathematical operations. A self-written codegenerator transforms the tree into C++ code. (Good example for the visitor pattern, by the way.) That's all fine and ok.

Now the project is growing and demand for more sophisticated code comes. It's very difficult for me to see what tools could be useful. What do people typically do and use when they have to generate program code programmatically? (With Python as "generator language" or in general)

(Sorry, but in this case, Google is really not my friend. Google can not transform a task into search results, unfortunately.)

Michael
  • 215
  • 1
  • 2
  • 7
  • 3
    I don't have an answer for you (this actually seems like a tool or poll question so I'm not convinced it _has_ an answer) but I can at least say that I do the same thing you do. – Lightness Races in Orbit Mar 23 '16 at 10:42
  • 2
    Unfortunately this is indeed an opinion poll and/or a tool recommendation as currently written, so it'll get closed sooner or later. But it's worth mentioning that C++ template metaprogramming is arguably a form of code generation built into all C++ compilers, so that's another option to consider. – Ixrec Mar 23 '16 at 10:48
  • 1
    It's Programmers, so I should be allowed to ask question that are a bit more open than on StackOverflow. If not, where (the hell) can I ask this kind of question? I think it's finally an absolutely legitimate question. – Michael Mar 23 '16 at 10:54
  • @Michael Can we know more about the code you're trying to write? There might be a reason why it's so tedious, and maybe we can focus on helping you there instead. – Snoop Mar 23 '16 at 11:18
  • 3
    @Michael: What you have been doing is creating a DSL (Domain Specific Language)-to-C++ translator/compiler. Perhaps that helps in your search. You might also want to look into compiler, parser and lexer technology. – Bart van Ingen Schenau Mar 23 '16 at 11:41
  • Algorithms for code generation would be on topic here I think. I would perhaps divide the code generator into classes handling each type of operation. When a new type of operation is added add a new class. It will keep the complexity of each operation down. – Bent Mar 23 '16 at 12:33
  • @Michael Instead of asking it as an open-ended "what tools do people use?" question, try reworking it as something more specific. As StevieV mentioned, if you go into more detail about the context, we can provide more specific answers. – DylanSp Mar 23 '16 at 14:48
  • I don't feel the question as asking for resources. It might be more a terminology issue. So I voted to reopen it. – Basile Starynkevitch Mar 23 '16 at 15:13
  • 1
    For simple grammars, I use Python, as you do. For more complex grammars, you could use Lex/Yacc or Flex/Bison, but I personally recommend http://www.antlr3.org/ especially with its visual workbench where you can step through your grammars and debug them visually,. A fantastic help. Of course, it generates C++ code, and it is free. – Mawg says reinstate Monica Mar 23 '16 at 15:42
  • 2
    https://www.python.org/about/success/cog/. It was the first match in a Google Search for "C++ Code generator from python" – Robert Harvey Mar 23 '16 at 15:48
  • Standard tools for assembling code fragments (ASTs) into larger code fragments are called "program transformation tools". With a good one you can write your code fragments in your target ("C++") notation as patterns with named placeholders for other fragments. Tool facilities provide means to compose (and even transform) the fragments by filling in the named points. Being driven by syntax, you can't get a composed fragment which is not legal syntax. See http://www.semanticdesigns.com/Products/DMS/DMSRewriteRules.html – Ira Baxter May 19 '16 at 08:16
  • @Mawg: AFAIK, there is no working C++ grammar for ANTLR. And you dont want to write one for C++ by yourself. (See http://stackoverflow.com/questions/243383/why-cant-c-be-parsed-with-a-lr1-parser/1004737#1004737) – Ira Baxter May 19 '16 at 08:19
  • I agree. As far as I recall, there is one for C. But the point that I made is that Antlr generates C++ code, which is what the OP is aking for. – Mawg says reinstate Monica May 19 '16 at 12:42
  • Curious if you found anything interesting since posting this? –  May 30 '17 at 03:27

2 Answers2

6

So you want to translate some (yours) domain specific language (DSL) to some flavor of C++. I am doing exactly the same in my GCC MELT implementation (inactive as of 2017). It is a Lisp-y domain specific language to customize the GCC compiler. See also this answer giving slightly more details, and this one giving relevant references.

Here are some advice; I cannot be more specific because I have absolutely no idea what your domain specific language is for. Is it Turing-complete (perhaps accidentally)? Probably yes! Read also this draft report.

  • if you never studied it, study compiler techniques (including lexing & parsing). They are highly relevant.

  • read Scott's book Programming Language Pragmatics (at least for inspiration).

  • consider (instead of developing your own DSL) embedding some existing interpreter, perhaps Guile or Lua. It might be considerably simpler.

  • be aware that designing and implementing a passable DSL which is compiled (perhaps to C++) is a lot of work (years!). Read the mythical man month, Hofstadter's law, etc... Perhaps you want (or not) to bootstrap your language implementation...

  • if your DSL is somehow useful (e.g. you are not the only one writing scripts in it), be aware that eventually some crazy user would code large scripts (many thousands lines) in it. So design the language seriously!

  • first, you need a well defined (in your head!) representation of the abstract syntax tree (which might not be a tree, but a graph) of the generated C++ code and you should build (in memory) the AST before emitting the corresponding C++ code

  • you might want to emit #line directives (referring to positions inside your DSL scripts). It is very useful (for debugging) but emitting them is quite hard.

  • you might need several other internal representations, between the DSL source and the generated C++ code, and your C++ code generator (actually a specialized compiler) is transforming some representations into another ones, and finally into an AST which is emitted as C++ code.

  • you should care about the memory model; so read about garbage collection techniques (at the very least, for terminology and concepts), see the GC handbook. You probably don't want a stupid user script to crash the computer or the process. So you need to handle memory (& memory leaks).

  • perhaps you might consider, instead of generating C++ code, to use JIT compiling techniques: GCCJIT, LLVM, libjit, asmjit, ...

  • perhaps SciLab, R, Octave, Julia might be relevant for your work (because they might avoid you to start your own DSL)

  • On Linux specifically, you might generate C++ code in some temporary file, compile it into a plugin (read Drepper's paper How To Write Shared Libraries and about Invoking GCC and see elf(5)) , then dlopen(3) it (and use dlsym(3) to get function pointers). Read then the C++ dlopen mini-howto. The RefPerSys project is doing that.

The CLASP project should be relevant: it is about Common Lisp and molecular chemistry simulation.

Basile Starynkevitch
  • 32,434
  • 6
  • 84
  • 125
2

Do you need to generate C++, or compiled code? You can leverage LLVM to produce compiled output from scripted tooling. The clang intermediate language is verbose but powerful - its what C++ code (or any other supported language) gets parsed into before compilation. Alternatively you can generate C++ from tools that still leverage the Clang parser, so instead of manipulating the text, you manipulate the internal AST the parser holds.

For an example, look at cmonster, which is a (rudimentary) python wrapper for clang's C++ parser.

gbjbaanb
  • 48,354
  • 6
  • 102
  • 172