1

For a C project, I'm upgrading my build process from

  • MingGW-GCC / make and Android Studio under Windows (2 separate processes)

to

  • Clang / CMake under Debian 8, using wclang and wine to compile and run the Windows build

It seems to me as if we build for one platform at a time. I suppose there is a large part of the process of compiling and linking which is platform-specific; that cannot be avoided given the enormous differences in platform. I would expect that to be the majority of the compiler's work.

But I wonder; does all cross-compilation work this way? Say we're compiling for 3 platforms: A, B, C.

Let's say 25% of the work involved in compiling for A can be reused in compiling for B & C (things like building the AST). Surely we would want to do that, thus reducing overall build time?

Are there such tools (particularly in relation to Clang, GCC)? And is there a name for such a "sharing" cross-compile, that I should know about? Thanks.

Engineer
  • 767
  • 5
  • 16
  • 1
    You could use an intermediate representation. For example .net and java use that approach, even when the code get compiled before distribution. LLVM has such internal representations, but I don't know if it's used for this purpose in practice (platform dependent `#DEFINE`s, constants, type sizes, etc would cause complications). – CodesInChaos May 07 '16 at 15:16
  • @CodesInChaos Yes - there is the `-emit-llvm` flag to compile to IR (Intermediate Representation). See [this](http://stackoverflow.com/questions/9148890/how-to-make-clang-compile-to-llvm-ir). Cheers for the line of thinking. – Engineer May 07 '16 at 15:24

1 Answers1

2

In a different language, yes, you could perhaps reuse an AST. However, the compilation model of C makes this impossible. First, a preprocessor phase runs over the code. Every compiler has its own #defines so that we can safely include compiler-specific options. Also, various defines allow us to check what system we are compiling for. In the code base I am working on, we use conditional compilation to select better system calls on operating systems which support them. Only after all these defines are resolved, do we get a C source that is parsed and converted to an AST. Since this AST now contains platform-specific assumptions, you can't use it as the basis for cross-platform compilation.

A related problem is that different systems and compilers may use different implementations for the C standard library. There is considerable freedom how some parts can be implemented, e.g. as functions or as macros. The point is that you cannot just use the headers for a wrong libc and expect everything to work.

A note on LLVM IR: while the IR is cross-platform in the sense that it can be compiled to machine code on any supported architecture, it already contains platform-specific assumptions such as integer widths, data layout, calling conventions, …. The IR is only useful when depugging, or when working on the compiler toolchain since it works as a data exchange format between various compiler phases (in particular, between optimization passes).

While this is a theoretical exercise, I also have to point out that you are expecting too much benefit from avoiding parsing. Modern compilers spend fairly little time parsing, and a lot of time optimizing. In the C compilation model, I/O time may also be relevant since you're often reading the same header files for each object file you're generating. Many compilers offer a pre-compiled header mechanism where you can create an AST for the first header included by a source file. This can be helpful if you include the same set of headers in every file, but it does not help with cross-compilation.

A more viable approach to cutting down compilation times is to slim down your headers: include as few other headers as possible. Prefer predeclarations over including another header. Divide your code into clear layers: each layer may only include headers from the lower layers – this avoids accidentally including nearly all other headers. Prefer small headers that only declare a couple of related symbols rather than using large headers where most declarations will go unused by most including files. Benchmark whether any suggestions even provide a measurable difference rather than blindly copying what some n00b wrote on the internet.

amon
  • 132,749
  • 27
  • 279
  • 375