It still can be done much cheaper. I just fixed my excessively cheap register allocator for my c compiler rcc.
The register allocator is a simple first-fit bitmask with no spilling to stack except for the two predefined spill slots. Only if all 8 registers are in use, it spills the additional registers on the stack.
What they call guest registers.
No SSA and no BB needed. No crazy mem2reg or graph-coloring. Only once per function.
Only for very big functions one register is spilled, usually just rsi.
Seems like tcc, slimcc, and gcc -O0 are all better than rcc in this table; fast compilation and better runtime. Only kefir and clang -O0 (which is unfortunately what we use) are worse.
> It still can be done much cheaper. I just fixed my excessively cheap register allocator for my c compiler rcc.
From that data, “GCCO2” is about twice as fast on compilation + single execution of some program as your C compiler rcc. That difference will get even larger if the compiled binary is run more than once. Why, then, do you imply your compiler is even cheaper?
More importantly, this paper is about binary translation, that is: taking a binary, reverting its register allocation, and then doing register allocation again for an architecture with a different register configuration. So, I don’t see how benchmarks of C compilers matter much for discussing it.
It still can be done much cheaper. I just fixed my excessively cheap register allocator for my c compiler rcc.
The register allocator is a simple first-fit bitmask with no spilling to stack except for the two predefined spill slots. Only if all 8 registers are in use, it spills the additional registers on the stack. What they call guest registers. No SSA and no BB needed. No crazy mem2reg or graph-coloring. Only once per function.
Only for very big functions one register is spilled, usually just rsi.
Benchmarks:
Seems like tcc, slimcc, and gcc -O0 are all better than rcc in this table; fast compilation and better runtime. Only kefir and clang -O0 (which is unfortunately what we use) are worse.
Because I didnt benchmark -O1 yet, the opt pass. It has to do much less work on pure functions, so even compile time is faster then. Still in work...
> It still can be done much cheaper. I just fixed my excessively cheap register allocator for my c compiler rcc.
From that data, “GCCO2” is about twice as fast on compilation + single execution of some program as your C compiler rcc. That difference will get even larger if the compiled binary is run more than once. Why, then, do you imply your compiler is even cheaper?
More importantly, this paper is about binary translation, that is: taking a binary, reverting its register allocation, and then doing register allocation again for an architecture with a different register configuration. So, I don’t see how benchmarks of C compilers matter much for discussing it.