Low-Compilation-Cost Register Allocation in LLVM-Based Binary Translation

rurban 23 hours ago

It still can be done much cheaper. I just fixed my excessively cheap register allocator for my c compiler rcc.

The register allocator is a simple first-fit bitmask with no spilling to stack except for the two predefined spill slots. Only if all 8 registers are in use, it spills the additional registers on the stack. What they call guest registers. No SSA and no BB needed. No crazy mem2reg or graph-coloring. Only once per function.

Only for very big functions one register is spilled, usually just rsi.

Benchmarks:

    | Compiler | Compile (ms) | Execute (ms) | Total (ms) |
    | :------- | -----------: | -----------: | ---------: |
    | RCC      |           61 |          754 |        815 |
    | TCC      |            8 |          628 |        636 |
    | SLIMCC   |           74 |          642 |        716 |
    | KEFIR    |          270 |          765 |       1035 |
    | GCC0     |           83 |          637 |        720 |
    | GCCO2    |          204 |          227 |        431 |
    | CLANG0   |          377 |          620 |        997 |
    | CLANGO2  |          310 |          221 |        531 |

wffurr 19 hours ago

Seems like tcc, slimcc, and gcc -O0 are all better than rcc in this table; fast compilation and better runtime. Only kefir and clang -O0 (which is unfortunately what we use) are worse.
- rurban 15 hours ago
  
  Because I didnt benchmark -O1 yet, the opt pass. It has to do much less work on pure functions, so even compile time is faster then. Still in work...
Someone 19 hours ago

> It still can be done much cheaper. I just fixed my excessively cheap register allocator for my c compiler rcc.
From that data, “GCCO2” is about twice as fast on compilation + single execution of some program as your C compiler rcc. That difference will get even larger if the compiled binary is run more than once. Why, then, do you imply your compiler is even cheaper?
More importantly, this paper is about binary translation, that is: taking a binary, reverting its register allocation, and then doing register allocation again for an architecture with a different register configuration. So, I don’t see how benchmarks of C compilers matter much for discussing it.