How specific is hardware optimization when building from source/how do I know?

Question

How specific is hardware optimization when building from source and what should I look for in the documentation to decide if building for my hardware might be worth it?

From threads like this one I gather that optimization for my "CPU and environment" is a possible advantage, but that there are also risks. How does one know which optimizations are already standard and which ones I can benefit from with minimal risk? Compilation time and size of the binaries are not a priority.

What are the CPU optimizations tuned to? Specific models? AMD vs Intel? CPU generation? Chipset? Number of cores?
Are these optimizations overlapping or separate from optimization flags like O2 and O3?
Does the "environment" part of the optimization have to do with my kernel version, which libraries I have installed, or what? Is part of optimization deciding which libraries to build with?
If I stick to recommended flags, is there still some hardware optimization, or do I need to use the "riskier" flags? And are they still risky if I know exactly what hardware I will run the build on?
My understanding of O-flags is limited to having read that the higher O-numbers are more optimized but have a higher risk of instability. Do I need to be a software engineer in order to make educated guesses about flags?

Compiling for the native architecture (`-march=native` in GCC) affects the compiler's cost model (how long does each machine instruction like loading memory take) and which instruction set extensions are available. Extensions matter for things like machine learning or video processing. This is largely unrelated to the optimization level O2/O3 which just selects a suite of optimization passes. The selected passes should be listed in the manual of your compiler. It's a bit excessive to consider O3 as risky, it's just that a lot of C/C++ software relies on undefined behaviour. — amon, Feb 29 '20 at 11:58

score 1 · Answer 1 · answered Feb 29 '20 at 12:13

The answer to the question How specific is hardware optimization when building from source is: TL;DR:As specific as the compiler's knowledge about your cpu architecture.

For example, from wikipedia:

The ARM Cortex-A57 is a microarchitecture implementing the ARMv8-A 64-bit instruction set designed by ARM Holdings.

The CPU ARM Cortex-A57 implements the instruction set. Your compiler knows about your instruction set, # of registers you have, size of your cache and so on. Based on this knowledge tries to optimize your program's non optimized, -O0, code to an optimized version. Let's say there was an update to gcc and It learned a new trick to optimize. Recent compiler is more likely to produce better code.

Optimization for Multi-core would not happen I would guess directly, because in a general purpose OS running multiple binaries, the resource allocation and scheduling would be unpredictable from your code's optimization point of view. My guess is that your code could be optimized in a way that when you have the possibility to use multi threads, It is able to utilize it.

But, in the case of SoC optimization for example even in Heterogeneous CPU's, as long as the tool you're using knows all about the system and knows the entirety of code it is going to run in its lifetime, it is possible. Though requires different tools, because then it's not about single compiler's ability to optimize, but there are a multitude of things as optimization constraints (e.g., inter-cpu communication).

There are companies trying to provide solutions for this, it is not a trivial thing and but you can try looking into their websites for white papers etc.

Thanks. So does it tend to be clear and well documented which compilers "know" the most about which CPUs? And with regard to the instruction set, is what instructions are present in the CPU all that matters, or are there things about their implementation that the compiler can take into account? — Stonecraft, Feb 29 '20 at 22:15

score 1 · Accepted Answer · answered Feb 29 '20 at 17:28

You should tell the compiler what is the minimum CPU that your code is running on, so it knows what CPU features are definitely available and can use them (and avoids CPU features that are not available and would crash if you try to run the code).

You can also tell the compiler which CPU it should optimise for. The compiler may then try to generate code that is as fast as possible on that CPU. Some people will optimise for the slowest CPU that is available, because some speed up is most important for the slow CPU. Some people will optimise for the most common CPU for obvious reasons. For some software you want the fastest CPU you can afford, so you would optimise for the fastest CPU available. If you build for yourself, you optimise for your CPU.

Usually it makes very little difference and people don’t worry about it.

Optimising for number of cores, or for cache sizes, is usually done in your code.

Is there a master list of these CPU features somewhere? It would be great if I could look at a table and see which are more unique and which are shared by most current PC CPUs. — Stonecraft, Feb 29 '20 at 22:18

How specific is hardware optimization when building from source/how do I know?

2 Answers2